Aggregate life table(s) to less granular age groups using standard life table aggregation functions of qx (and ax).
agg_lt(dt, id_cols, age_mapping, quiet = F, ...)
[data.table()
]
Life table to be aggregated. Must include all columns in id_cols
, and
at least two of 'qx', 'ax', and 'mx', or just 'qx'.
[character()
]
ID columns that uniquely identify each row of dt
.
[data.table()
]
Specification of intervals to aggregate to. Required columns are
'age_start' and 'age_end'. Use "Inf" as 'age_end' for terminal age group.
The age group intervals must be contiguous and cover the entire interval
specified in the input life tables dt
.
[logical(1)
]
Should progress messages be suppressed as the function is run? Default is
False.
Other arguments to pass to hierarchyUtils::agg()
.
[data.table()
]
Aggregated life table(s) with columns for all
id_cols
. A column for 'qx' is always included, a column for 'ax' will
also be returned if two of 'qx', 'ax', and 'mx' are included in the input
dt
. Will only return the age groups specified in age_mapping
.
See the references page for the formatted equations below.
This function works by aggregating the qx and ax life table parameters
separately. If only qx is included in dt
then ax aggregation is not done.
qx aggregation:
To explain how qx is aggregated it is useful to define a couple of different events: $$D = \text{death between age } x \text{ and } x + n$$ $$D' = \text{survival between age } x \text{ and } x + n$$ $$S = \text{survival to age } x$$
Now qx and px can be written in terms of events D and S. $${}_{n}q_x = P(D | S)$$ $${}_{n}p_x = P(D' | S)$$
Now say there are multiple sub age-groups that make up the overall age group between \(x \text{ and } x + n\). Let subscripts "1" and "2" indicate values specific to the first and second sub age-groups and assume values with no subscript apply to the original aggregate age group. The first sub age-group could be between \(x \text{ and } x + n_1\) and the second between \(x + n_1 \text{ and } x + n_1 + n_2\), where \(n_1 + n_2 = n\).
The overall px value can be written as a function of the sub age-group's px values. $$P(D'|S) = P(D'_1|B_1) \cap P(D'_2|B_2) = P(D'_1|B_1) * P(D'_2|B_2)$$
where: $$P(D'_1 | B_1) = \text{survival between age } x \text{ and } x + n_1 \text{ given survival to age } x$$ $$P(D'_2 | B_2) = \text{survival between age } x + n_1 \text{ and } x + n \text{ given survival to age } x + n_1$$
More generally if there are \(A\) age groups between age \(x \text{ and } x + n\), and \(i\) indexes each of the sub age intervals then: $${}_{n}p_x = \prod_{i=1}^{A} {}_{n_i}p_{x_i} = \prod_{i=1}^{A}(1 - {}_{n_i}q_{x_i})$$ $${}_{n}q_x = 1 - {}_{n}p_x$$
ax aggregation:
\({}_{n}a_x\) is aggregated across age groups by aggregating the number of person-years lived in each age group by those who died in the interval.
$${}_{n}a_x \cdot {}_{n}d_x = \text{person-years lived between age } x \text{ and } x + n \text{ by those who died in this age interval}$$ where: $${}_{n}a_x = \text{average years lived between age } x \text{ and } x + n \text{ by those who died in the age interval}$$ $${}_{n}d_x = \text{number that died between age } x \text{ and } x + n$$
Now say there are \(A\) age groups between age \(x \text{ and } x + n\), and \(i\) indexes each of the sub age intervals. The total number of person-years lived by those who died in the aggregate age group is a simple sum of the number of person-years lived in each sub age interval. $${}_{n}a_x \cdot {}_{n}d_x = \sum_{i = 1}^{A} ((x_i - x) + {}_{n_i}a_{x_i}) \cdot {}_{n_i}d_{x_i}$$ where: $$x_i - x = \text{ number of complete person years lived in the previous sub age intervals by someone who dies in sub age interval } i$$
The aggregate ax can then be solved for. $${}_{n}a_x = \frac{\sum_{i = 1}^{A} ((x_i - x) + {}_{n_i}a_{x_i}) \cdot {}_{n_i}d_{x_i}}{\sum_{i = 1}^{A} {}_{n_i}d_{x_i}}$$
missing_dt_severity
:
Check for missing levels of col_stem
, the variable being aggregated or
scaled over.
stop
: throw error (this is the default).
warning
or message
: throw warning/message and continue with
aggregation/scaling for requested aggregations/scalings where expected input
data in dt
is available.
none
: don't throw error or warning, continue with aggregation/scaling
for requested aggregations/scalings where expected input data in dt
is
available.
skip
: skip this check and continue with aggregation/scaling.
present_agg_severity
(agg
only):
Check for requested aggregates in mapping
that are already present
stop
: throw error (this is the default).
warning
or message
: throw warning/message, drop aggregates and continue
with aggregation.
none
: don't throw error or warning, drop aggregates and continue with
aggregation.
skip
: skip this check and add to the values already present for the
aggregates.
na_value_severity
:
Check for 'NA' values in the value_cols
.
stop
: throw error (this is the default).
warning
or message
: throw warning/message, drop missing values and
continue with aggregation/scaling where possible (this likely will cause
another error because of missing_dt_severity
, consider setting
missing_dt_severity = "skip"
for functionality similiar to na.rm = TRUE
).
none
: don't throw error or warning, drop missing values and continue
with aggregation/scaling where possible (this likely will cause another error
because of missing_dt_severity
, consider setting
missing_dt_severity = "skip"
for functionality similiar to na.rm = TRUE
).
skip
: skip this check and propagate NA
values through
aggregation/scaling.
overlapping_dt_severity
:
Check for overlapping intervals that prevent collapsing to the most detailed
common set of intervals. Or check for overlapping intervals in col_stem
when aggregating/scaling.
stop
: throw error (this is the default).
warning
or message
: throw warning/message, drop overlapping intervals
and continue with aggregation/scaling where possible (this may cause another
error because of missing_dt_severity
).
none
: don't throw error or warning, drop overlapping intervals and
continue with aggregation/scaling where possible (this may cause another
error because of missing_dt_severity
).
skip
: skip this check and continue with aggregation/scaling.
dt <- data.table::data.table(
age_start = c(0:110),
age_end = c(1:110, Inf),
location = "Canada",
qx = c(rep(.2, 110), 1),
ax = .5
)
id_cols = c("age_start", "age_end", "location")
dt <- agg_lt(
dt = dt,
id_cols = id_cols,
age_mapping = data.table::data.table(
age_start = seq(0, 105, 5),
age_end = seq(5, 110, 5)
)
)
#> Aggregating px across age groups
#> Aggregating age
#> Interval group 1 of 1: [0, 1),[1, 2),[2, 3),[3, 4),[4, 5),[5, 6),[6, 7),[7, 8),[8, 9),[9, 10),[10, 11),[11, 12),[12, 13),[13, 14),[14, 15),[15, 16),[16, 17),[17, 18),[18, 19),[19, 20),[20, 21),[21, 22),[22, 23),[23, 24),[24, 25),[25, 26),[26, 27),[27, 28),[28, 29),[29, 30),[30, 31),[31, 32),[32, 33),[33, 34),[34, 35),[35, 36),[36, 37),[37, 38),[38, 39),[39, 40),[40, 41),[41, 42),[42, 43),[43, 44),[44, 45),[45, 46),[46, 47),[47, 48),[48, 49),[49, 50),[50, 51),[51, 52),[52, 53),[53, 54),[54, 55),[55, 56),[56, 57),[57, 58),[58, 59),[59, 60),[60, 61),[61, 62),[62, 63),[63, 64),[64, 65),[65, 66),[66, 67),[67, 68),[68, 69),[69, 70),[70, 71),[71, 72),[72, 73),[73, 74),[74, 75),[75, 76),[76, 77),[77, 78),[78, 79),[79, 80),[80, 81),[81, 82),[82, 83),[83, 84),[84, 85),[85, 86),[86, 87),[87, 88),[88, 89),[89, 90),[90, 91),[91, 92),[92, 93),[93, 94),[94, 95),[95, 96),[96, 97),[97, 98),[98, 99),[99, 100),[100, 101),[101, 102),[102, 103),[103, 104),[104, 105),[105, 106),[106, 107),[107, 108),[108, 109),[109, 110),[110, Inf)
#> Aggregate 1 of 22: [0, 5)
#> Aggregate 2 of 22: [5, 10)
#> Aggregate 3 of 22: [10, 15)
#> Aggregate 4 of 22: [15, 20)
#> Aggregate 5 of 22: [20, 25)
#> Aggregate 6 of 22: [25, 30)
#> Aggregate 7 of 22: [30, 35)
#> Aggregate 8 of 22: [35, 40)
#> Aggregate 9 of 22: [40, 45)
#> Aggregate 10 of 22: [45, 50)
#> Aggregate 11 of 22: [50, 55)
#> Aggregate 12 of 22: [55, 60)
#> Aggregate 13 of 22: [60, 65)
#> Aggregate 14 of 22: [65, 70)
#> Aggregate 15 of 22: [70, 75)
#> Aggregate 16 of 22: [75, 80)
#> Aggregate 17 of 22: [80, 85)
#> Aggregate 18 of 22: [85, 90)
#> Aggregate 19 of 22: [90, 95)
#> Aggregate 20 of 22: [95, 100)
#> Aggregate 21 of 22: [100, 105)
#> Aggregate 22 of 22: [105, 110)
#> Aggregating ax across age groups
#> Aggregating age
#> Interval group 1 of 1: [0, 1),[1, 2),[2, 3),[3, 4),[4, 5),[5, 6),[6, 7),[7, 8),[8, 9),[9, 10),[10, 11),[11, 12),[12, 13),[13, 14),[14, 15),[15, 16),[16, 17),[17, 18),[18, 19),[19, 20),[20, 21),[21, 22),[22, 23),[23, 24),[24, 25),[25, 26),[26, 27),[27, 28),[28, 29),[29, 30),[30, 31),[31, 32),[32, 33),[33, 34),[34, 35),[35, 36),[36, 37),[37, 38),[38, 39),[39, 40),[40, 41),[41, 42),[42, 43),[43, 44),[44, 45),[45, 46),[46, 47),[47, 48),[48, 49),[49, 50),[50, 51),[51, 52),[52, 53),[53, 54),[54, 55),[55, 56),[56, 57),[57, 58),[58, 59),[59, 60),[60, 61),[61, 62),[62, 63),[63, 64),[64, 65),[65, 66),[66, 67),[67, 68),[68, 69),[69, 70),[70, 71),[71, 72),[72, 73),[73, 74),[74, 75),[75, 76),[76, 77),[77, 78),[78, 79),[79, 80),[80, 81),[81, 82),[82, 83),[83, 84),[84, 85),[85, 86),[86, 87),[87, 88),[88, 89),[89, 90),[90, 91),[91, 92),[92, 93),[93, 94),[94, 95),[95, 96),[96, 97),[97, 98),[98, 99),[99, 100),[100, 101),[101, 102),[102, 103),[103, 104),[104, 105),[105, 106),[106, 107),[107, 108),[108, 109),[109, 110)
#> Aggregate 1 of 22: [0, 5)
#> Aggregate 2 of 22: [5, 10)
#> Aggregate 3 of 22: [10, 15)
#> Aggregate 4 of 22: [15, 20)
#> Aggregate 5 of 22: [20, 25)
#> Aggregate 6 of 22: [25, 30)
#> Aggregate 7 of 22: [30, 35)
#> Aggregate 8 of 22: [35, 40)
#> Aggregate 9 of 22: [40, 45)
#> Aggregate 10 of 22: [45, 50)
#> Aggregate 11 of 22: [50, 55)
#> Aggregate 12 of 22: [55, 60)
#> Aggregate 13 of 22: [60, 65)
#> Aggregate 14 of 22: [65, 70)
#> Aggregate 15 of 22: [70, 75)
#> Aggregate 16 of 22: [75, 80)
#> Aggregate 17 of 22: [80, 85)
#> Aggregate 18 of 22: [85, 90)
#> Aggregate 19 of 22: [90, 95)
#> Aggregate 20 of 22: [95, 100)
#> Aggregate 21 of 22: [100, 105)
#> Aggregate 22 of 22: [105, 110)