Calculate summary statistics for life tables when collapsing over a certain variable. For example can calculate summary statistics across a set of draws, or locations, or location-years, etc.

summarize_lt(
  dt,
  id_cols,
  summarize_cols,
  value_cols,
  summary_fun = c("mean"),
  probs = c(0.025, 0.975),
  recalculate_starting_params = c("mx", "ax"),
  recalculate_stats = c("mean"),
  preserve_u5 = FALSE,
  assert_na = FALSE,
  format_long = FALSE
)

Arguments

dt

[data.table()]
Life tables to calculate summary statistics for. Must include all id_cols, summarize_cols, and value_cols.

id_cols

[character()]
ID columns that uniquely identify each row of dt.

summarize_cols

[character()]
The id_cols that should be collapsed and to calculate summary statistics over.

value_cols

[character()]
Value columns (life table parameters) that summary statistics should be calculated for. Valid choices are 'mx', 'qx', 'ax,', 'dx', 'px', 'lx', 'Tx', 'nLx', & 'ex'. Must include at least 2 of 'mx', 'qx', or 'ax'.

summary_fun

[character()]
Names of the functions that can be used to summarize a vector of values. Default is "mean".

probs

[numeric()]
Probabilities with values in [0,1] to be used when producing sample quantiles with stats::quantile(). Default is 0.025 and 0.975 for the 2.5th and 97.5th quantiles. Can be NULL if no quantiles are needed.

recalculate_starting_params

[character(2)]
2 of 'mx', 'qx', or 'ax' from which to recalculate certain summary statistics (recalculate_stats) for all other value_cols using demCore::lifetable.

recalculate_stats

[character()]
The summary statistic to recalculate all life table parameters corresponding to that statistic so that they are consistent with one another. Uses demCore::lifetable to recalculate all life table parameters and keeps the life table parameters specified in value_cols. Default is 'mean'.

preserve_u5

[logical()]
Whether to preserve under-5 qx estimates rather than recalculating them based on mx and ax.

assert_na

[logical()]
Whether to assert that there is no missingness.

format_long

[logical(1)]
Whether to format output with a 'life_table_parameter' column and a column for each summary statistic (rather than a column for each 'life_table_parameter' and summary statistic pair).

Value

[data.table()] with id_cols (minus the summarize_cols) plus summary statistic columns. The summary statistic columns have the same name as each function specified in summary_fun and the quantiles are named like 'q_(probs * 100)'. Each of the summary statistic columns that are returned are prefixed with the value column name. If format_long then output is returned with a 'life_table_parameter' column and a column for each summary statistic.

Details

One example use case for summarize_lt is when we have multiple draws (independent simulations) of life tables to propagate uncertainty. Each of the independent draws of life tables may have all life table parameters. The mean, 2.5th and 97.5 percentiles across all draws for all life table parameters can be calculated. But the mean 'mx', 'qx', 'ax' parameters would be inconsistent with the mean 'ex' parameter for example. This is when specifying recalculate_stats = 'mean' would recalculate the mean 'ex' parameter using the mean 'mx', 'qx', and/or 'ax' as inputs to the demCore::lifetable function.

See also

demUtils::summarize_dt

demCore::lifetable

demCore::validate_lifetable

Examples

library(data.table)
data("austria_1992_lt")
dt <- data.table::data.table()
for(d in 1:100){
 dt_new <- copy(austria_1992_lt)
 dt_new[, draw := d]
 dt_new[, mx := mx * rnorm(1, mean = 1, sd = 0.05)]
 dt_new[, ax := ax * rnorm(1, mean = 1, sd = 0.05)]
 dt_new[, qx := NULL]
 dt <- rbind(dt, dt_new, fill = TRUE)
}
dt <- dt[!is.na(age_start)]
dt <- dt[, .(age_start, age_end, draw, mx, ax)]
dt <- summarize_lt(
  dt = dt,
  id_cols = c("age_start", "age_end", "draw"),
  summarize_cols = c("draw"),
  value_cols = c("mx", "ax"),
  recalculate_stats = "mean"
)