The Time Series Data Library (TSDL) was created by Rob Hyndman, Professor of Statistics at Monash University, Australia. It includes data from a lot of time series textbooks, as well as many other series that he has either collected for student projects or helpful people have sent to him.
The data library was once hosted on Professor Hyndman’s personal website since about 1992. In 2012 it was moved onto DataMarket which provides much better facilities for maintaining and using time series data. You can still access the data library from the website or using rdatamarket
package to read it into R, but tsdl
package provides a simpler means.
If you use any data from the TSDL in a publication, please use the following citation:
Rob Hyndman and Yangzhuoran Yang (2018). tsdl: Time Series Data Library. v0.1.0. https://pkg.yangzhuoranyang./tsdl/.
The data files will remain on Professor Hyndman’s personal website so that existing links will not be broken.
You can install the development version from Github
tsdl
is a list of 648 series of class tsdl
. Each series within tsdl is of class ts
.
library(tsdl)
tsdl
#> Time Series Data Library: 648 time series
#>
#> Frequency
#> Subject 0.1 0.25 1 4 5 6 12 13 52 365 Total
#> Agriculture 0 0 37 0 0 0 3 0 0 0 40
#> Chemistry 0 0 8 0 0 0 0 0 0 0 8
#> Computing 0 0 6 0 0 0 0 0 0 0 6
#> Crime 0 0 1 0 0 0 2 1 0 0 4
#> Demography 1 0 9 2 0 0 3 0 0 2 17
#> Ecology 0 0 23 0 0 0 0 0 0 0 23
#> Finan+H77ce 0 0 1 0 0 0 0 0 0 0 1
#> Finance 0 0 22 5 0 0 20 0 2 1 50
#> Health 0 0 8 0 0 0 6 0 1 0 15
#> Hydrology 0 0 42 0 0 0 78 1 0 6 127
#> Industry 0 0 9 0 0 0 2 0 1 0 12
#> Labour market 0 0 3 4 0 0 17 0 0 0 24
#> Macro-Economic 0 0 18 33 0 0 5 0 0 0 56
#> Meteorology 0 0 18 0 0 0 16 0 0 12 46
#> Micro-Economic 0 0 27 1 0 0 7 0 1 0 36
#> Miscellaneous 0 0 4 0 1 1 3 0 1 0 10
#> Mwhdata 0 1 0 0 0 0 1 0 0 0 2
#> Physics 0 0 12 0 0 0 4 0 0 0 16
#> Production 0 0 4 14 0 0 28 1 1 0 48
#> Sales 0 0 10 3 0 0 24 0 9 0 46
#> Sport 0 0 1 0 0 0 0 0 0 0 1
#> Transport and tourism 0 0 1 1 0 0 12 0 0 0 14
#> Tree-rings 0 0 34 0 0 0 1 0 0 0 35
#> Utilities 0 0 2 1 0 0 8 0 0 0 11
#> Total 1 1 300 64 1 1 240 3 16 21 648
To extract series with specific features, one can use function subset
. The most common way to extract series is to specify frequency
or subject
(type) of the series. The position of these two set conditions are interchangeable.
# Subset by frequency
tsdl_quarterly <- subset(tsdl,4)
tsdl_quarterly
#> Time Series Data Library: 64 time series with frequency 4
#>
#> Frequency
#> Subject 4
#> Demography 2
#> Finance 5
#> Labour market 4
#> Macro-Economic 33
#> Micro-Economic 1
#> Production 14
#> Sales 3
#> Transport and tourism 1
#> Utilities 1
#> Total 64
# Subset by subject
tsdl_industry <- subset(tsdl,"Industry")
tsdl_industry
#> Time Series Data Library: 12 Industry time series
#>
#> Frequency
#> Subject 1 12 52 Total
#> Industry 9 2 1 12
# Subset by frequency and subject
tsdl_daily_industry <- subset(tsdl,12,"Industry")
tsdl_daily_industry
#> Time Series Data Library: 2 Industry time series with frequency 12
#>
#> Frequency
#> Subject 12
#> Industry 2
User can also subset the data set using specified start
year, or keywords in its source
attribute or description
attribute.
# Subset by source
tsdl_abs <- subset(tsdl, source = "Australian Bureau of Statistics")
tsdl_abs
#> Time Series Data Library: 65 time series
#>
#> Frequency
#> Subject 1 4 12 Total
#> Agriculture 0 0 1 1
#> Demography 0 2 1 3
#> Finance 0 1 2 3
#> Labour market 0 0 4 4
#> Macro-Economic 1 19 1 21
#> Production 0 13 16 29
#> Sales 0 0 1 1
#> Transport and tourism 0 0 2 2
#> Utilities 0 1 0 1
#> Total 1 36 28 65
# Subset by starting year
tsdl_1948 <- subset(tsdl, start = 1948)
tsdl_1948
#> Time Series Data Library: 10 time series
#>
#> Frequency
#> Subject 4 12 Total
#> Hydrology 0 1 1
#> Labour market 1 5 6
#> Macro-Economic 3 0 3
#> Total 4 6 10
# Subset by description
tsdl_nettraffic <- subset(tsdl, description = "Internet traffic")
tsdl_nettraffic
#> Time Series Data Library: 6 Computing time series with frequency 1
#>
#> Frequency
#> Subject 1
#> Computing 6
To access attributes information of the time series, one can directly extract its attributes.
attributes(tsdl[[1]])
#> $tsp
#> [1] 1948.00 1979.75 4.00
#>
#> $class
#> [1] "ts"
#>
#> $source
#> [1] "Abraham & Ledolter (1983)"
#>
#> $description
#> [1] "Quarterly Iowa nonfarm income (1948 – 1979)"
#>
#> $subject
#> [1] "Macro-Economic"
The collective attributes information is stored in the data frame meta_tsdl
. One can also access the possible choices of subject
and other options when subset time series.
str(meta_tsdl)
#> 'data.frame': 648 obs. of 5 variables:
#> $ source : chr "Abraham & Ledolter (1983)" "Abraham & Ledolter (1983)" "Abraham & Ledolter (1983)" "Abraham & Ledolter (1983)" ...
#> $ description: chr "Quarterly Iowa nonfarm income (1948 – 1979)" "Monthly demand repair parts large/heavy equip. Iowa 1972 – 1979" "Montly av. residential gas usage Iowa (cubic feet)*100 ’71 – ’79" "Monthly gasoline demand Ontario gallon millions 1960 – 1975" ...
#> $ frequency : num 4 12 12 12 12 12 4 12 12 12 ...
#> $ start : num 1948 1972 1971 1960 1967 ...
#> $ subject : chr "Macro-Economic" "Industry" "Utilities" "Sales" ...
unique(meta_tsdl$subject)
#> [1] "Macro-Economic" "Industry"
#> [3] "Utilities" "Sales"
#> [5] "Transport and tourism" "Micro-Economic"
#> [7] "Production" "Labour market"
#> [9] "Physics" "Agriculture"
#> [11] "Ecology" "Health"
#> [13] "Hydrology" "Meteorology"
#> [15] "Demography" "Finance"
#> [17] "Tree-rings" "Chemistry"
#> [19] "Sport" "Miscellaneous"
#> [21] "Finan+H77ce" "Mwhdata"
#> [23] "Crime" "Computing"