Calculate summary statistics when collapsing over a certain variable. For example can calculate summary statistics across a set of draws, or locations, or location-years, etc.

summarize_dt(
  dt,
  id_cols,
  summarize_cols,
  value_cols,
  summary_fun = c("mean"),
  probs = c(0.025, 0.975)
)

Arguments

dt

[data.table()]
Data to calculate summary statistics for.

id_cols

[character()]
ID columns that uniquely identify each row of dt.

summarize_cols

[character()]
The id_cols that should be collapsed and to calculate summary statistics over.

value_cols

[character()]
Value columns that summary statistics should be calculated for. When more than one column is specified, each of the summary statistic columns that are returned are prefixed with the value column name.

summary_fun

[character()]
Names of the functions that can be used to summarize a vector of values. Default is "mean".

probs

[numeric()]
Probabilities with values in [0,1] to be used when producing sample quantiles with stats::quantile(). Default is 0.025 and 0.975 for the 2.5th and 97.5th quantiles. Can be NULL if no quantiles are needed.

Value

[data.table()] with id_cols (minus the summarize_cols) plus summary statistic columns. The summary statistic columns have the same name as each function specified in summary_fun and the quantiles are named like 'q_(probs * 100)'. If more than one value_cols is specified, each of the summary statistic columns that are returned are prefixed with the value column name.

Details

summary_fun correspond to names of functions in R that can take a vector of values and reduce to one summary statistic. This can also include user defined functions specified in the global environment.

The probs argument is used to specify the probabilities to calculate sample quantiles for using the stats::quantile() function. The default 2.5 and 97.5 quantiles would have columns named 'q2.5' and 'q97.5' in the returned data.table.

Examples

input_dt <- data.table::data.table(location = "USA", draw = 1:101, value = 0:100) output_dt <- summarize_dt( dt = input_dt, id_cols = c("location", "draw"), summarize_cols = "draw", value_cols = "value" ) # no quantiles calculated output_dt <- summarize_dt( dt = input_dt, id_cols = c("location", "draw"), summarize_cols = "draw", value_cols = "value", probs = NULL )