Insert time series rows with regularly spaced timestamps
The easiest way to fill in missing timestamps or convert to a more
granular period (e.g. quarter to month). Wraps the padr::pad()
function
for padding tibbles.
pad_by_time( .data, .date_var, .by = "auto", .pad_value = NA, .fill_na_direction = c("none", "down", "up", "downup", "updown"), .start_date = NULL, .end_date = NULL )
.data |
A tibble with a time-based column. |
.date_var |
A column containing date or date-time values to pad |
.by |
Either "auto", a time-based frequency like "year", "month", "day", "hour", etc, or a time expression like "5 min", or "7 days". See Details. |
.pad_value |
Fills in padded values. Default is |
.fill_na_direction |
Users can provide an |
.start_date |
Specifies the start of the padded series. If NULL it will use the lowest value of the input variable. |
.end_date |
Specifies the end of the padded series. If NULL it will use the highest value of the input variable. |
Padding Missing Observations
The most common use case for pad_by_time()
is to add rows where timestamps
are missing. This could be from sales data that have missing values on weekends and holidays.
Or it could be high frequency data where observations are irregularly spaced and should be
reset to a regular frequency.
Going from Low to High Frequency
The second use case is going from a low frequency (e.g. day) to high frequency (e.g. hour).
This is possible by supplying a higher frequency to pad_by_time()
.
Interval, .by
Padding can be applied in the following ways:
.by = "auto"
- pad_by_time()
will detect the time-stamp frequency and apply padding.
The eight intervals in are: year, quarter, month, week, day, hour, min, and sec.
Intervals like 5 minutes, 6 hours, 10 days are possible.
This function wraps the padr::pad()
function developed by Edwin Thoen.
Imputation:
ts_impute_vec()
- Impute missing values for time series.
Time-Based dplyr functions:
summarise_by_time()
- Easily summarise using a date column.
mutate_by_time()
- Simplifies applying mutations by time windows.
pad_by_time()
- Insert time series rows with regularly spaced timestamps
filter_by_time()
- Quickly filter using date ranges.
filter_period()
- Apply filtering expressions inside periods (windows)
slice_period()
- Apply slice inside periods (windows)
condense_period()
- Convert to a different periodicity
between_time()
- Range detection for date or date-time sequences.
slidify()
- Turn any function into a sliding (rolling) function
library(tidyverse) library(tidyquant) library(timetk) # Create a quarterly series with 1 missing value missing_data_tbl <- tibble( date = tk_make_timeseries("2014-01-01", "2015-01-01", by = "quarter"), value = 1:5 ) %>% slice(-4) # Lose the 4th quarter on purpose missing_data_tbl # Detects missing quarter, and pads the missing regularly spaced quarter with NA missing_data_tbl %>% pad_by_time(date, .by = "quarter") # Can specify a shorter period. This fills monthly. missing_data_tbl %>% pad_by_time(date, .by = "month") # Can let pad_by_time() auto-detect date and period missing_data_tbl %>% pad_by_time() # Can specify a .pad_value missing_data_tbl %>% pad_by_time(date, .by = "quarter", .pad_value = 0) # Can then impute missing values missing_data_tbl %>% pad_by_time(date, .by = "quarter") %>% mutate(value = ts_impute_vec(value, period = 1)) # Can specify a custom .start_date and .end_date missing_data_tbl %>% pad_by_time(date, .by = "quarter", .start_date = "2013", .end_date = "2015-07-01") # Can specify a tidyr::fill() direction missing_data_tbl %>% pad_by_time(date, .by = "quarter", .fill_na_direction = "downup", .start_date = "2013", .end_date = "2015-07-01") # --- GROUPS ---- # Apply standard NA padding to groups FANG %>% group_by(symbol) %>% pad_by_time(.by = "day") # Apply filled padding to groups FANG %>% group_by(symbol) %>% pad_by_time(.by = "day", .fill_na_direction = "down")
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.