R/interval_collapse.R
helper_common_intervals.Rd
identify_common_intervals()
identifies the most detailed
common set of intervals for a given interval variable and
merge_common_intervals()
merges these on to the original dataset.
collapse_common_intervals()
calls both these functions internally to
help collapse to the most detailed common intervals.
identify_common_intervals(dt, id_cols, col_stem, include_missing = FALSE)
merge_common_intervals(dt, common_intervals, col_stem)
[data.table()
]
Dataset containing the interval variable.
[character()
]
ID columns that uniquely identify each row of dt
. If 'NULL' then common
intervals across entire dataset are identified.
[character(1)
]
The name of the variable to collapse, should not include the '_start' or
'_end' suffix for the interval variable.
[logical(1)
]
Whether to include missing intervals in the identified most detailed common
intervals. These missing intervals are not present in all combinations of
id_cols
. Default is "FALSE".
[data.table()
]
Common intervals returned by identify_common_intervals()
identify_common_intervals()
returns a [data.table()
] with two
columns called 'col_stem_start' and 'col_stem_end' defining the most
detailed common set of intervals for the col_stem
interval variable.
identify_common_intervals()
returns a [data.table()
] with the
same columns and rows as originally in dt
, with two additional columns
merged on from common_intervals
. These new columns are called
'common_start' and 'common_end' defining the most detailed common interval
each row maps to.
id_cols <- c("year_start", "year_end", "sex", "age_start", "age_end")
# set up test input data.table
input_dt_male <- data.table::CJ(year_start = 2005, year_end = 2010,
sex = "male",
age_start = seq(0, 95, 5),
value = 25)
input_dt_male[age_start == 95, value := 5]
#> Key: <year_start, year_end, sex, age_start>
#> Index: <age_start>
#> year_start year_end sex age_start value
#> <num> <num> <char> <num> <num>
#> 1: 2005 2010 male 0 25
#> 2: 2005 2010 male 5 25
#> 3: 2005 2010 male 10 25
#> 4: 2005 2010 male 15 25
#> 5: 2005 2010 male 20 25
#> 6: 2005 2010 male 25 25
#> 7: 2005 2010 male 30 25
#> 8: 2005 2010 male 35 25
#> 9: 2005 2010 male 40 25
#> 10: 2005 2010 male 45 25
#> 11: 2005 2010 male 50 25
#> 12: 2005 2010 male 55 25
#> 13: 2005 2010 male 60 25
#> 14: 2005 2010 male 65 25
#> 15: 2005 2010 male 70 25
#> 16: 2005 2010 male 75 25
#> 17: 2005 2010 male 80 25
#> 18: 2005 2010 male 85 25
#> 19: 2005 2010 male 90 25
#> 20: 2005 2010 male 95 5
#> year_start year_end sex age_start value
input_dt_female <- data.table::CJ(year_start = 2005:2009,
sex = "female",
age_start = seq(0, 95, 1),
value = 1)
gen_end(input_dt_female, setdiff(id_cols, c("year_end", "age_end")),
col_stem = "year", right_most_endpoint = 2010)
input_dt <- rbind(input_dt_male, input_dt_female)
gen_end(input_dt, setdiff(id_cols, "age_end"), col_stem = "age")
data.table::setkeyv(input_dt, id_cols)
common_intervals <- hierarchyUtils:::identify_common_intervals(
dt = input_dt,
id_cols = id_cols,
col_stem = "year"
)
data.table::setnames(common_intervals, c("year_start", "year_end"),
c("common_start", "common_end"))
result_dt <- hierarchyUtils:::merge_common_intervals(
dt = input_dt,
common_intervals = common_intervals,
col_stem = "year"
)