Split unknown groupings according to observed proportion and redistribute.

split_unknown(
  dt,
  id_cols,
  value_cols,
  col_stem,
  col_type,
  mapping,
  quiet = FALSE
)

Arguments

dt

[data.table()]
Data which includes both known and unknown groupings. Unknown groupings should be indicated with NA entries for columns defined by col_stem.

id_cols

[character()]
ID columns that uniquely identify each row of dt.

value_cols

[character(1)]
Value columns that should be split and distributed. Currently must be length 1.

col_stem

[character(1)]
The name of the variable that defines the groupings being split. If an 'interval' variable should not include the '_start' or '_end' suffix.

col_type

[character(1)]
The type of variable that defines the groupings being split. Can be either 'categorical' or 'interval'. Ex: sex is 'categorical' and typically 'age' is 'interval'.

mapping

[character(1)]
For 'categorical' variables, defines how different levels of the hierarchical variable relate to each other.

quiet

[logical(1)]
Should progress messages be suppressed as the function is run? Default is False.

Value

[data.table()]

dt with unknown groupings split across known groupings, and then removed.

Examples

# interval
dt <- data.table::data.table(
  age_start = c(0, 1, 2, NA),
  age_end = c(1, 2, 3, NA),
  population = c(20, 30, 50, 10)
)
dt <- split_unknown(
  dt,
  id_cols = c("age_start", "age_end"),
  value_cols = "population",
  col_stem = "age",
  col_type = "interval",
  mapping = data.table::data.table(age_start = c(0), age_end = c(3))
)

# categorical
dt <- data.table::data.table(
  sex = c("male", "female", NA),
  population = c(25, 75, 10)
)
dt <- split_unknown(
  dt,
  id_cols = "sex",
  value_cols = "population",
  col_stem = "sex",
  col_type = "categorical",
  mapping = data.table::data.table(
      parent = c("all", "all"),
      child = c("male", "female")
 )
)