vignettes/introduction_to_life_tables.Rmd
introduction_to_life_tables.Rmd
A life table is a table that includes information to describe the dying out of a birth cohort. This can also be a synthetic birth cohort, in which case we refer to it as a period life table.
Life tables are one of the most important devices in demography – they have been used since the 1600s! They can also be useful for other fields, because they are generalizable to other discrete “time to event” data.
Typically, life tables have one row per age group, with columns representing life table metrics, also known as parameters. The life table parameters used in this package are \(_nm_x\), \(_na_x\), \(_nq_x\), \(_np_x\), \(l_x\), \(_nd_x\), \(e_x\), \(_nL_x\), and \(T_x\). In this notation, \(x\) refers to age, and the metrics apply to either the age \(x\) directly or to the interval between ages \(x\) and \(x+n\) where \(n\) indicates the length of the interval, typically in years. We often shorthand by removing the “n” from the notation, with interval length implied.
Here’s an example of a period life table, for males in Austria in
1992, which is saved in the package as austria_1992_lt
(source: Preston 2001):
x | x+n | deaths | pop | mx | ax | qx | px | lx | dx | nLx | Tx | ex |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 419 | 47925 | 0.01 | 0.07 | 0.01 | 0.99 | 1.00 | 0.01 | 0.99 | 72.89 | 72.89 |
1 | 5 | 70 | 189127 | 0.00 | 1.63 | 0.00 | 1.00 | 0.99 | 0.00 | 3.96 | 71.90 | 72.53 |
5 | 10 | 36 | 234793 | 0.00 | 2.50 | 0.00 | 1.00 | 0.99 | 0.00 | 4.95 | 67.94 | 68.63 |
10 | 15 | 46 | 238790 | 0.00 | 3.14 | 0.00 | 1.00 | 0.99 | 0.00 | 4.94 | 62.99 | 63.68 |
15 | 20 | 249 | 254996 | 0.00 | 2.72 | 0.00 | 1.00 | 0.99 | 0.00 | 4.93 | 58.04 | 58.74 |
20 | 25 | 420 | 326831 | 0.00 | 2.52 | 0.01 | 0.99 | 0.98 | 0.01 | 4.90 | 53.11 | 54.01 |
25 | 30 | 403 | 355086 | 0.00 | 2.48 | 0.01 | 0.99 | 0.98 | 0.01 | 4.87 | 48.21 | 49.35 |
30 | 35 | 441 | 324222 | 0.00 | 2.60 | 0.01 | 0.99 | 0.97 | 0.01 | 4.84 | 43.34 | 44.61 |
35 | 40 | 508 | 269963 | 0.00 | 2.70 | 0.01 | 0.99 | 0.96 | 0.01 | 4.80 | 38.50 | 39.90 |
40 | 45 | 769 | 261971 | 0.00 | 2.66 | 0.01 | 0.99 | 0.96 | 0.01 | 4.75 | 33.70 | 35.25 |
45 | 50 | 1154 | 238011 | 0.00 | 2.70 | 0.02 | 0.98 | 0.94 | 0.02 | 4.66 | 28.95 | 30.73 |
50 | 55 | 1866 | 261612 | 0.01 | 2.68 | 0.04 | 0.96 | 0.92 | 0.03 | 4.52 | 24.29 | 26.42 |
55 | 60 | 2043 | 181385 | 0.01 | 2.64 | 0.05 | 0.95 | 0.89 | 0.05 | 4.32 | 19.77 | 22.29 |
60 | 65 | 3496 | 187962 | 0.02 | 2.62 | 0.09 | 0.91 | 0.84 | 0.07 | 4.01 | 15.45 | 18.43 |
65 | 70 | 4366 | 153832 | 0.03 | 2.62 | 0.13 | 0.87 | 0.76 | 0.10 | 3.58 | 11.43 | 14.97 |
70 | 75 | 4337 | 105169 | 0.04 | 2.59 | 0.19 | 0.81 | 0.66 | 0.12 | 3.01 | 7.86 | 11.86 |
75 | 80 | 5279 | 73694 | 0.07 | 2.52 | 0.30 | 0.70 | 0.54 | 0.16 | 2.28 | 4.84 | 9.00 |
80 | 85 | 6460 | 57512 | 0.11 | 2.42 | 0.44 | 0.56 | 0.37 | 0.16 | 1.45 | 2.56 | 6.84 |
85 | Inf | 6146 | 32248 | 0.19 | 5.25 | 1.00 | 0.00 | 0.21 | 0.21 | 1.11 | 1.11 | 5.25 |
For reference, the following is a list of life table metrics and their definitions.
\(\mathbf{_nm_x}\): mortality rate between ages \(x\) and \(x+n\). Shorthand to \(m_x\) with implied interval width (\(n\)). Equals deaths divided by person-years lived in the interval. Mid-year population is commonly used as an adequate approximation of the person-years denominator.
\(\mathbf{_na_x}\): mean person-years lived between ages \(x\) and \(x+n\) for those who die within the interval. Shorthand to \(a_x\) with implied interval width (\(n\)).
\(\mathbf{_nq_x}\): probability of death between ages \(x\) and \(x+n\), conditional on survival to age \(x\). Shorthand to \(q_x\) with implied interval width (\(n\)). Equals deaths in the interval divided by survivors to \(x\)-th birthday. Examples: \(_5q_0\) = probability of death between birth and age \(5\); \(_{45}q_{15}\) = probability of death between age \(15\) and age \(60\) conditional on survival to age \(15\).
\(\mathbf{l_x}\): proportion of the cohort surviving to age \(x\).
\(\mathbf{e_x}\): life expectancy at age \(x\) – mean number of years lived after \(x\)-th birthday by those surviving to age \(x\). Life expectancy at birth is \(e_0\).
\(\mathbf{_nL_x}\): total person-years lived between age \(x\) and \(x+n\).
\(\mathbf{T_x}\): total person-years lived above age \(x\).
\(\mathbf{_nd_x}\): proportion of the cohort dying between ages \(x\) and \(x+n\). Shorthand to \(d_x\).
\(\mathbf{_np_x}\): probability of survival between ages \(x\) and \(x+n\) conditional on survival to age \(x\). Inverse of \(qx\).
We often reduce life tables to age patterns of log probability of death (\(\text{log}(q_x)\)) or to survival curves (\(l_x\) over age), which can be easily displayed and vetted in plots.
The demCore
package includes many utility functions for
calculations that leverage the mathematical relationships between life
table metrics to build out a complete life table. This section will
provide details and examples regarding the use of these functions and
their underlying methods. We will accomplish this by following along the
example of building the example life table above from death counts and
population.
Note that this document and this package do not contain an exhaustive list of relationships between metrics. Additionally, some equations presented rely on assumptions and others are true relationships that are always valid. For more details, see the Preston Demography textbook, from which many of these details were drawn.
From raw death count and population data, the place to start with a life table is \(m_x\).
\[m_x = \frac{\text{deaths}}{\text{person-years}} \approx \frac{\text{observed deaths}}{\text{mid-interval population}}\]
Let’s load in our example data and calculate \(m_x\):
If we have \(m_x\) and \(q_x\) we can directly calculate \(a_x\). However, we often use \(m_x\) and \(a_x\) to get \(q_x\) in the first place, and so have to make some assumptions to get \(a_x\). Empirical calculations of \(a_x\) would require detailed and accurate data on age of death in days (such as paired date of birth and date of death), which is typically unavailable.
Rule of thumb:
One option is to assume all deaths occur in the middle of the interval, so \(a_x \approx n/2\). This assumption works well for most ages, but it doesn’t work as well for very young or very old where mortality can change rapidly over the interval.
Another assumption we can make is that the age-specific death rate is
constant between \(x\) and \(x+n\). Under this assumption, \[_na_x = n + \frac{1}{_nm_x} - \frac{n}{1-
e^{-n \cdot {_nm_x}}}.\] The function mx_to_ax
implements this assumption.
Using our example data, we get:
dt <- hierarchyUtils::gen_length(dt, col_stem = "age")
dt[, ax := mx_to_ax(mx = mx, age_length = age_length)]
Note that we can use hierarchyUtils::gen_length
to add
the age_length
column given age_start
and
age_end
.
1a0 and 4a1:
Preston et al adapted an analysis first completed by Coale and Demeny (1983) to derive a relationship between infant mortality rate (\(_1m_0\)) and under-5 \(a_x\) values (\(_1a_0\) and \(_4a_1\)). In the absence of reliable data to produce \(a_x\), these relationships can be used to predict \(a_x\) from infant \(m_x\):
Males | Females | |
---|---|---|
1a0: | ||
If 1m0 >= 0.107 | 0.330 | 0.350 |
If 1m0 < 0.107 | 0.045 + 2.684 * 1m0 | 0.053 + 2.800 * 1m0 |
4a1: | ||
If 1m0 >= 0.107 | 1.352 | 1.361 |
If 1m0 < 0.107 | 1.651 - 2.816 * 1m0 | 1.522 - 1.518 * 1m0 |
Use the gen_u5_ax_from_mx
function to implement this
method:
dt[, sex := "male"]
gen_u5_ax_from_mx(dt, id_cols = c("age_start", "age_end", "sex"))
Graduation method: One strategy for selecting \(a_x\) values is based on the level and slope of the \(_nm_x\) function. Comparing two populations with the same \(_5m_60\), the population with more rapidly rising mortality rate with respect to age will have deaths that are more concentrated in the later part of the interval (higher \(a_x\)). Comparing two populations with the same slope in \(m_x\), the one with higher mortality rate will have more deaths at the beginning of the interval (lower \(a_x\)).
To utilize this theory, we can implement iteration as described in
the Preston book, and originally proposed by Keyfitz (1966): \[_na_x = \frac{\frac{-n}{24} {_nd_{x-n}} +
\frac{n}{2} {_nd_x} +
\frac{n}{24} {_nd_{x+n}}}{_nd_x}\] Where \(d_x\) is derived from the conversion from
\(m_x\) to \(q_x\). However, since the \(m_x\) to \(q_x\) conversion requires \(a_x\), this requires us to pick a starting
place for \(a_x\) (like \(n/2\)), solve for \(d_x\), solve for \(a_x\), and so on until convergence. Use
demCore::iterate_ax
to implement this method.
From \(m_x\) and \(a_x\), we can solve directly for \(q_x\):
\[_nq_x = \frac{n \cdot {_nm_x}}{1 + (n - {_na_x}) \cdot {_nm_x}}\] For the terminal age group, \(q_x\) should be \(1\) because all individuals surviving to the terminal age group will die in that age group (probability of death = \(1\)).
The mx_ax_to_qx
combines the equation for \(q_x\) and the requirement that terminal
\(q_x\) equal one by setting \(q_x = 1\) if
age_length = Inf
.
dt[, qx := mx_ax_to_qx(mx = mx, ax = ax, age_length = age_length)]
Other functions that utilize this relationship but solve for
different metrics are mx_qx_to_ax
and
qx_ax_to_mx
.
You can also solve for \(q_x\) under the assumption of constant mortality rate within an interval, which removes \(a_x\) from the relationship:
\[_nq_x = 1 - e^{-n \cdot {_nm_x}}.\]
dt[, qx_compare := mx_to_qx(mx = mx, age_length = age_length)]
These two \(q_x\) values are the
same, because the implied \(a_x\) in
mx_to_qx
is equivalent to the \(a_x\) we generate under the assumption in
mx_to_ax
.
To calculate the proportion of a cohort surviving to age \(x\) (\(l_x\)), we set \(l_0 = 1\) (100% survive to birth), and recursively calculate:
\[l_{x+n} = l_x \cdot (1 - _nq_x)\]
or in words, the proportion surviving to age \(x\) times the proportion of those survivors who do not die between \(x\) and \(x+n\) is the proportion surviving to age \(x+n\).
Our gen_lx_from_qx
function can perform this
calculation:
gen_lx_from_qx(dt, id_cols = c("age_start", "age_end"))
Proportion of cohort dying between ages \(x\) and \(x+n\) (\(d_x\)) is \(_nq_0\) to start, then \(_nd_x = l_x - l_{x+n}\) thereafter (difference between proportion surviving to age \(x\) and proportion surviving to age \(x+n\)).
To calculate \(d_x\), use
gen_dx_from_lx
:
gen_dx_from_lx(dt, id_cols = c("age_start", "age_end"))
The person-years lived between ages \(x\) and \(x+n\) (\(_nL_x\)) can be broken down into:
such that:
\[_nL_x = n \cdot l_{x+n} + _na_x \cdot {_nd_x}.\]
For the terminal age group:
\[{_{\infty}L_x} = \text{person-years lived above age } x = \frac{\text{person-years lived above age }x}{\text{deaths over age }x} \cdot \text{deaths over age }x= \frac{l_x}{_{\infty}m_x}\]
Use the gen_nLx
function to calculate with this
method:
Life expectancy above age \(x\) (mean person-years lived above age \(x\)) is equal to the total person years over age \(x\) divided by the persons surviving to age \(x\):
\[e_x = \frac{T_x}{lx}.\]
For the terminal age group, \(a_x = e_x\) because everyone surviving to the interval dies in the interval.
Calculate \(e_x\) with
gen_ex
:
gen_ex(dt)
One possible set of steps for calculating a complete period life table from deaths and mid-year population is:
gen_u5_ax_from_mx
, and
set ax over age 5 as n/2mx_ax_to_qx
iterate_ax
to modify ax and qx
values, improving ax over the naive n/2 valueslifetable
function. The
lifetable
function combines many of the functions described
in this vignette for convenience.Preston Samuel H, Patrick H, Michel G. Demography: measuring and modeling population processes. MA: Blackwell Publishing. 2001.
Coale AJ, Demeny P, Vaughan B. Regional model life tables and stable populations: studies in population. Elsevier; 2013 Oct 22.
Keyfitz N. A life table that agrees with the data. Journal of the American Statistical Association. 1966 Jun 1;61(314):305-12.