Helper functions for collapsing to the most detailed common intervals — identify_common

identify_common_intervals() identifies the most detailed common set of intervals for a given interval variable and merge_common_intervals() merges these on to the original dataset. collapse_common_intervals() calls both these functions internally to help collapse to the most detailed common intervals.

identify_common_intervals(dt, id_cols, col_stem, include_missing = FALSE)

merge_common_intervals(dt, common_intervals, col_stem)

Arguments

dt: [data.table()]
Dataset containing the interval variable.
id_cols: [character()]
ID columns that uniquely identify each row of dt. If 'NULL' then common intervals across entire dataset are identified.
col_stem: [character(1)]
The name of the variable to collapse, should not include the '_start' or '_end' suffix for the interval variable.
include_missing: [logical(1)]
Whether to include missing intervals in the identified most detailed common intervals. These missing intervals are not present in all combinations of id_cols. Default is "FALSE".
common_intervals: [data.table()]
Common intervals returned by identify_common_intervals()

Value

identify_common_intervals() returns a [data.table()] with two columns called 'col_stem_start' and 'col_stem_end' defining the most detailed common set of intervals for the col_stem interval variable.

identify_common_intervals() returns a [data.table()] with the same columns and rows as originally in dt, with two additional columns merged on from common_intervals. These new columns are called 'common_start' and 'common_end' defining the most detailed common interval each row maps to.

Examples

id_cols <- c("year_start", "year_end", "sex", "age_start", "age_end")

# set up test input data.table
input_dt_male <- data.table::CJ(year_start = 2005, year_end = 2010,
                                sex = "male",
                                age_start = seq(0, 95, 5),
                                value = 25)
input_dt_male[age_start == 95, value := 5]
#> Key: <year_start, year_end, sex, age_start>
#> Index: <age_start>
#>     year_start year_end    sex age_start value
#>          <num>    <num> <char>     <num> <num>
#>  1:       2005     2010   male         0    25
#>  2:       2005     2010   male         5    25
#>  3:       2005     2010   male        10    25
#>  4:       2005     2010   male        15    25
#>  5:       2005     2010   male        20    25
#>  6:       2005     2010   male        25    25
#>  7:       2005     2010   male        30    25
#>  8:       2005     2010   male        35    25
#>  9:       2005     2010   male        40    25
#> 10:       2005     2010   male        45    25
#> 11:       2005     2010   male        50    25
#> 12:       2005     2010   male        55    25
#> 13:       2005     2010   male        60    25
#> 14:       2005     2010   male        65    25
#> 15:       2005     2010   male        70    25
#> 16:       2005     2010   male        75    25
#> 17:       2005     2010   male        80    25
#> 18:       2005     2010   male        85    25
#> 19:       2005     2010   male        90    25
#> 20:       2005     2010   male        95     5
#>     year_start year_end    sex age_start value
input_dt_female <- data.table::CJ(year_start = 2005:2009,
                                  sex = "female",
                                  age_start = seq(0, 95, 1),
                                  value = 1)
gen_end(input_dt_female, setdiff(id_cols, c("year_end", "age_end")),
        col_stem = "year", right_most_endpoint = 2010)
input_dt <- rbind(input_dt_male, input_dt_female)
gen_end(input_dt, setdiff(id_cols, "age_end"), col_stem = "age")
data.table::setkeyv(input_dt, id_cols)

common_intervals <- hierarchyUtils:::identify_common_intervals(
  dt = input_dt,
  id_cols = id_cols,
  col_stem = "year"
)
data.table::setnames(common_intervals, c("year_start", "year_end"),
                     c("common_start", "common_end"))

result_dt <- hierarchyUtils:::merge_common_intervals(
  dt = input_dt,
  common_intervals = common_intervals,
  col_stem = "year"
)