Calculate summary statistics when collapsing over a certain variable. For example can calculate summary statistics across a set of draws, or locations, or location-years, etc.
summarize_dt( dt, id_cols, summarize_cols, value_cols, summary_fun = c("mean"), probs = c(0.025, 0.975) )
dt | [ |
---|---|
id_cols | [ |
summarize_cols | [ |
value_cols | [ |
summary_fun | [ |
probs | [ |
[data.table()
] with id_cols
(minus the summarize_cols
) plus
summary statistic columns. The summary statistic columns have the same name
as each function specified in summary_fun
and the quantiles are named
like 'q_(probs * 100)
'. If more than one value_cols
is specified, each
of the summary statistic columns that are returned are prefixed with the
value column name.
summary_fun
correspond to names of functions in R that can take a vector of
values and reduce to one summary statistic. This can also include user
defined functions specified in the global environment.
The probs
argument is used to specify the probabilities to calculate sample
quantiles for using the stats::quantile()
function. The default 2.5 and
97.5 quantiles would have columns named 'q2.5' and 'q97.5' in the returned
data.table.
input_dt <- data.table::data.table(location = "USA", draw = 1:101, value = 0:100) output_dt <- summarize_dt( dt = input_dt, id_cols = c("location", "draw"), summarize_cols = "draw", value_cols = "value" ) # no quantiles calculated output_dt <- summarize_dt( dt = input_dt, id_cols = c("location", "draw"), summarize_cols = "draw", value_cols = "value", probs = NULL )