Calculate summary statistics for life tables when collapsing over a certain variable. For example can calculate summary statistics across a set of draws, or locations, or location-years, etc.
[data.table()
]
Life tables to calculate summary statistics for. Must include all
id_cols
, summarize_cols
, and value_cols
.
[character()
]
ID columns that uniquely identify each row of dt
.
[character()
]
The id_cols
that should be collapsed and to calculate summary statistics
over.
[character()
]
Value columns (life table parameters) that summary statistics should be
calculated for. Valid choices are 'mx', 'qx', 'ax,', 'dx', 'px', 'lx',
'Tx', 'nLx', & 'ex'. Must include at least 2 of 'mx', 'qx', or 'ax'.
[character()
]
Names of the functions that can be used to summarize a vector of values.
Default is "mean".
[numeric()
]
Probabilities with values in [0,1]
to be used when producing sample
quantiles with stats::quantile()
. Default is 0.025 and 0.975 for the
2.5th and 97.5th quantiles. Can be NULL if no quantiles are needed.
[character(2)
]
2 of 'mx', 'qx', or 'ax' from which to recalculate certain summary
statistics (recalculate_stats
) for all other value_cols
using
demCore::lifetable
.
[character()
]
The summary statistic to recalculate all life table parameters
corresponding to that statistic so that they are consistent with one
another. Uses demCore::lifetable
to recalculate all life table parameters
and keeps the life table parameters specified in value_cols
. Default is
'mean'.
[logical()
]
Whether to preserve under-5 qx estimates rather than recalculating them
based on mx and ax.
[logical()
]
Whether to assert that there is no
missingness.
[logical(1)
]
Whether to format output with a 'life_table_parameter' column and a column
for each summary statistic (rather than a column for each
'life_table_parameter' and summary statistic pair).
[data.table()
] with id_cols
(minus the summarize_cols
) plus
summary statistic columns. The summary statistic columns have the same name
as each function specified in summary_fun
and the quantiles are named
like 'q_(probs * 100)
'. Each of the summary statistic columns that are
returned are prefixed with the value column name. If format_long
then
output is returned with a 'life_table_parameter' column and a column for
each summary statistic.
One example use case for summarize_lt
is when we have multiple draws
(independent simulations) of life tables to propagate uncertainty. Each of
the independent draws of life tables may have all life table parameters. The
mean, 2.5th and 97.5 percentiles across all draws for all life table
parameters can be calculated. But the mean 'mx', 'qx', 'ax' parameters would
be inconsistent with the mean 'ex' parameter for example. This is when
specifying recalculate_stats = 'mean'
would recalculate the mean 'ex'
parameter using the mean 'mx', 'qx', and/or 'ax' as inputs to the
demCore::lifetable
function.
demUtils::summarize_dt
demCore::lifetable
demCore::validate_lifetable
library(data.table)
data("austria_1992_lt")
dt <- data.table::data.table()
for(d in 1:100){
dt_new <- copy(austria_1992_lt)
dt_new[, draw := d]
dt_new[, mx := mx * rnorm(1, mean = 1, sd = 0.05)]
dt_new[, ax := ax * rnorm(1, mean = 1, sd = 0.05)]
dt_new[, qx := NULL]
dt <- rbind(dt, dt_new, fill = TRUE)
}
dt <- dt[!is.na(age_start)]
dt <- dt[, .(age_start, age_end, draw, mx, ax)]
dt <- summarize_lt(
dt = dt,
id_cols = c("age_start", "age_end", "draw"),
summarize_cols = c("draw"),
value_cols = c("mx", "ax"),
recalculate_stats = "mean"
)