Split unknown groupings according to observed proportion and redistribute.
split_unknown(
dt,
id_cols,
value_cols,
col_stem,
col_type,
mapping,
quiet = FALSE
)
[data.table()
]
Data which includes both known and unknown groupings. Unknown groupings
should be indicated with NA entries for columns defined by col_stem
.
[character()
]
ID columns that uniquely identify each row of dt
.
[character(1)
]
Value columns that should be split and distributed. Currently must be
length 1.
[character(1)
]
The name of the variable that defines the groupings being split. If an
'interval' variable should not include the '_start' or '_end' suffix.
[character(1)
]
The type of variable that defines the groupings being split. Can be either
'categorical' or 'interval'. Ex: sex is 'categorical' and typically 'age'
is 'interval'.
[character(1)
]
For 'categorical' variables, defines how different levels of the
hierarchical variable relate to each other.
[logical(1)
]
Should progress messages be suppressed as the function is run? Default is
False.
[data.table()
]
dt
with unknown groupings split across known groupings, and then removed.
# interval
dt <- data.table::data.table(
age_start = c(0, 1, 2, NA),
age_end = c(1, 2, 3, NA),
population = c(20, 30, 50, 10)
)
dt <- split_unknown(
dt,
id_cols = c("age_start", "age_end"),
value_cols = "population",
col_stem = "age",
col_type = "interval",
mapping = data.table::data.table(age_start = c(0), age_end = c(3))
)
# categorical
dt <- data.table::data.table(
sex = c("male", "female", NA),
population = c(25, 75, 10)
)
dt <- split_unknown(
dt,
id_cols = "sex",
value_cols = "population",
col_stem = "sex",
col_type = "categorical",
mapping = data.table::data.table(
parent = c("all", "all"),
child = c("male", "female")
)
)