R/interval_collapse.R
collapse_common_intervals.Rd
Collapse an interval variable to the most detailed common set of
intervals available for each combination of id_cols
in a dataset.
Aggregates the collapsed dataset to the common set of intervals.
collapse_common_intervals(
dt,
id_cols,
value_cols,
col_stem,
agg_function = sum,
missing_dt_severity = "stop",
overlapping_dt_severity = "stop",
include_missing = FALSE
)
[data.table()
]
Dataset containing the interval variable.
[character()
]
ID columns that uniquely identify each row of dt
.
[character()
]
Value columns that should be aggregated.
[character(1)
]
The name of the variable to collapse, should not include the '_start' or
'_end' suffix for the interval variable.
[function()
]
Function to use when aggregating, can be either sum
(for counts) or
prod
(for probabilities).
[character(1)
]
How severe should the consequences of missing intervals that prevent
collapsing to the most detailed common set of intervals be? Can be either
'skip', 'stop', 'warning', 'message', or 'none'. If not "stop", then only the
intervals that can be correctly collapsed will be done.
[character(1)
]
When aggregating/scaling an interval variable or collapse_interval_cols=TRUE
what should happen when overlapping intervals are identified? Can be either
'skip', 'stop', 'warning', 'message', or 'none'. Default is 'stop'. See
section on 'Severity Arguments' for more information.
[logical(1)
]
Whether to include missing intervals in the identified most detailed common
intervals. These missing intervals are not present in all combinations of
id_cols
. Default is "FALSE".
[data.table()
] with id_cols
and value_cols
columns but with
the col_stem
intervals reduced to only the most detailed common set of
intervals.
id_cols <- c("year_start", "year_end", "sex", "age_start", "age_end")
value_cols <- c("value")
# set up test input data.table
input_dt_male <- data.table::CJ(year_start = 2005, year_end = 2010,
sex = "male",
age_start = seq(0, 95, 5),
value = 25)
input_dt_male[age_start == 95, value := 5]
#> Key: <year_start, year_end, sex, age_start>
#> Index: <age_start>
#> year_start year_end sex age_start value
#> <num> <num> <char> <num> <num>
#> 1: 2005 2010 male 0 25
#> 2: 2005 2010 male 5 25
#> 3: 2005 2010 male 10 25
#> 4: 2005 2010 male 15 25
#> 5: 2005 2010 male 20 25
#> 6: 2005 2010 male 25 25
#> 7: 2005 2010 male 30 25
#> 8: 2005 2010 male 35 25
#> 9: 2005 2010 male 40 25
#> 10: 2005 2010 male 45 25
#> 11: 2005 2010 male 50 25
#> 12: 2005 2010 male 55 25
#> 13: 2005 2010 male 60 25
#> 14: 2005 2010 male 65 25
#> 15: 2005 2010 male 70 25
#> 16: 2005 2010 male 75 25
#> 17: 2005 2010 male 80 25
#> 18: 2005 2010 male 85 25
#> 19: 2005 2010 male 90 25
#> 20: 2005 2010 male 95 5
#> year_start year_end sex age_start value
input_dt_female <- data.table::CJ(year_start = 2005:2009,
sex = "female",
age_start = seq(0, 95, 1),
value = 1)
gen_end(input_dt_female, setdiff(id_cols, c("year_end", "age_end")),
col_stem = "year", right_most_endpoint = 2010)
input_dt <- rbind(input_dt_male, input_dt_female)
gen_end(input_dt, setdiff(id_cols, "age_end"), col_stem = "age")
data.table::setkeyv(input_dt, id_cols)
collapsed_dt <- collapse_common_intervals(
dt = input_dt,
id_cols = id_cols,
value_cols = value_cols,
col_stem = "year"
)
collapsed_dt <- collapse_common_intervals(
dt = collapsed_dt,
id_cols = id_cols,
value_cols = value_cols,
col_stem = "age"
)