Collapse an interval variable to the most detailed common set of intervals — collapse_common

Collapse an interval variable to the most detailed common set of intervals available for each combination of id_cols in a dataset. Aggregates the collapsed dataset to the common set of intervals.

collapse_common_intervals(
  dt,
  id_cols,
  value_cols,
  col_stem,
  agg_function = sum,
  missing_dt_severity = "stop",
  overlapping_dt_severity = "stop",
  include_missing = FALSE
)

Arguments

dt: [data.table()]
Dataset containing the interval variable.
id_cols: [character()]
ID columns that uniquely identify each row of dt.
value_cols: [character()]
Value columns that should be aggregated.
col_stem: [character(1)]
The name of the variable to collapse, should not include the '_start' or '_end' suffix for the interval variable.
agg_function: [function()]
Function to use when aggregating, can be either sum (for counts) or prod (for probabilities).
missing_dt_severity: [character(1)]
How severe should the consequences of missing intervals that prevent collapsing to the most detailed common set of intervals be? Can be either 'skip', 'stop', 'warning', 'message', or 'none'. If not "stop", then only the intervals that can be correctly collapsed will be done.
overlapping_dt_severity: [character(1)]
When aggregating/scaling an interval variable or collapse_interval_cols=TRUE what should happen when overlapping intervals are identified? Can be either 'skip', 'stop', 'warning', 'message', or 'none'. Default is 'stop'. See section on 'Severity Arguments' for more information.
include_missing: [logical(1)]
Whether to include missing intervals in the identified most detailed common intervals. These missing intervals are not present in all combinations of id_cols. Default is "FALSE".

Value

[data.table()] with id_cols and value_cols columns but with the col_stem intervals reduced to only the most detailed common set of intervals.

Examples

id_cols <- c("year_start", "year_end", "sex", "age_start", "age_end")
value_cols <- c("value")

# set up test input data.table
input_dt_male <- data.table::CJ(year_start = 2005, year_end = 2010,
                                sex = "male",
                                age_start = seq(0, 95, 5),
                                value = 25)
input_dt_male[age_start == 95, value := 5]
#> Key: <year_start, year_end, sex, age_start>
#> Index: <age_start>
#>     year_start year_end    sex age_start value
#>          <num>    <num> <char>     <num> <num>
#>  1:       2005     2010   male         0    25
#>  2:       2005     2010   male         5    25
#>  3:       2005     2010   male        10    25
#>  4:       2005     2010   male        15    25
#>  5:       2005     2010   male        20    25
#>  6:       2005     2010   male        25    25
#>  7:       2005     2010   male        30    25
#>  8:       2005     2010   male        35    25
#>  9:       2005     2010   male        40    25
#> 10:       2005     2010   male        45    25
#> 11:       2005     2010   male        50    25
#> 12:       2005     2010   male        55    25
#> 13:       2005     2010   male        60    25
#> 14:       2005     2010   male        65    25
#> 15:       2005     2010   male        70    25
#> 16:       2005     2010   male        75    25
#> 17:       2005     2010   male        80    25
#> 18:       2005     2010   male        85    25
#> 19:       2005     2010   male        90    25
#> 20:       2005     2010   male        95     5
#>     year_start year_end    sex age_start value
input_dt_female <- data.table::CJ(year_start = 2005:2009,
                                  sex = "female",
                                  age_start = seq(0, 95, 1),
                                  value = 1)
gen_end(input_dt_female, setdiff(id_cols, c("year_end", "age_end")),
        col_stem = "year", right_most_endpoint = 2010)
input_dt <- rbind(input_dt_male, input_dt_female)
gen_end(input_dt, setdiff(id_cols, "age_end"), col_stem = "age")
data.table::setkeyv(input_dt, id_cols)


collapsed_dt <- collapse_common_intervals(
  dt = input_dt,
  id_cols = id_cols,
  value_cols = value_cols,
  col_stem = "year"
)
collapsed_dt <- collapse_common_intervals(
  dt = collapsed_dt,
  id_cols = id_cols,
  value_cols = value_cols,
  col_stem = "age"
)