What is a life table?

A life table is a table that includes information to describe the dying out of a birth cohort. This can also be a synthetic birth cohort, in which case we refer to it as a period life table.

Life tables are one of the most important devices in demography – they have been used since the 1600s! They can also be useful for other fields, because they are generalizable to other discrete “time to event” data.

Typically, life tables have one row per age group, with columns representing life table metrics, also known as parameters. The life table parameters used in this package are nmx_nm_x, nax_na_x, nqx_nq_x, npx_np_x, lxl_x, ndx_nd_x, exe_x, nLx_nL_x, and TxT_x. In this notation, xx refers to age, and the metrics apply to either the age xx directly or to the interval between ages xx and x+nx+n where nn indicates the length of the interval, typically in years. We often shorthand by removing the “n” from the notation, with interval length implied.

Here’s an example of a period life table, for males in Austria in 1992, which is saved in the package as austria_1992_lt (source: Preston 2001):

x x+n deaths pop mx ax qx px lx dx nLx Tx ex
0 1 419 47925 0.01 0.07 0.01 0.99 1.00 0.01 0.99 72.89 72.89
1 5 70 189127 0.00 1.63 0.00 1.00 0.99 0.00 3.96 71.90 72.53
5 10 36 234793 0.00 2.50 0.00 1.00 0.99 0.00 4.95 67.94 68.63
10 15 46 238790 0.00 3.14 0.00 1.00 0.99 0.00 4.94 62.99 63.68
15 20 249 254996 0.00 2.72 0.00 1.00 0.99 0.00 4.93 58.04 58.74
20 25 420 326831 0.00 2.52 0.01 0.99 0.98 0.01 4.90 53.11 54.01
25 30 403 355086 0.00 2.48 0.01 0.99 0.98 0.01 4.87 48.21 49.35
30 35 441 324222 0.00 2.60 0.01 0.99 0.97 0.01 4.84 43.34 44.61
35 40 508 269963 0.00 2.70 0.01 0.99 0.96 0.01 4.80 38.50 39.90
40 45 769 261971 0.00 2.66 0.01 0.99 0.96 0.01 4.75 33.70 35.25
45 50 1154 238011 0.00 2.70 0.02 0.98 0.94 0.02 4.66 28.95 30.73
50 55 1866 261612 0.01 2.68 0.04 0.96 0.92 0.03 4.52 24.29 26.42
55 60 2043 181385 0.01 2.64 0.05 0.95 0.89 0.05 4.32 19.77 22.29
60 65 3496 187962 0.02 2.62 0.09 0.91 0.84 0.07 4.01 15.45 18.43
65 70 4366 153832 0.03 2.62 0.13 0.87 0.76 0.10 3.58 11.43 14.97
70 75 4337 105169 0.04 2.59 0.19 0.81 0.66 0.12 3.01 7.86 11.86
75 80 5279 73694 0.07 2.52 0.30 0.70 0.54 0.16 2.28 4.84 9.00
80 85 6460 57512 0.11 2.42 0.44 0.56 0.37 0.16 1.45 2.56 6.84
85 Inf 6146 32248 0.19 5.25 1.00 0.00 0.21 0.21 1.11 1.11 5.25

Life table metrics

For reference, the following is a list of life table metrics and their definitions.

𝐧𝐦𝐱\mathbf{_nm_x}: mortality rate between ages xx and x+nx+n. Shorthand to mxm_x with implied interval width (nn). Equals deaths divided by person-years lived in the interval. Mid-year population is commonly used as an adequate approximation of the person-years denominator.

𝐧𝐚𝐱\mathbf{_na_x}: mean person-years lived between ages xx and x+nx+n for those who die within the interval. Shorthand to axa_x with implied interval width (nn).

𝐧𝐪𝐱\mathbf{_nq_x}: probability of death between ages xx and x+nx+n, conditional on survival to age xx. Shorthand to qxq_x with implied interval width (nn). Equals deaths in the interval divided by survivors to xx-th birthday. Examples: 5q0_5q_0 = probability of death between birth and age 55; 45q15_{45}q_{15} = probability of death between age 1515 and age 6060 conditional on survival to age 1515.

𝐥𝐱\mathbf{l_x}: proportion of the cohort surviving to age xx.

𝐞𝐱\mathbf{e_x}: life expectancy at age xx – mean number of years lived after xx-th birthday by those surviving to age xx. Life expectancy at birth is e0e_0.

𝐧𝐋𝐱\mathbf{_nL_x}: total person-years lived between age xx and x+nx+n.

𝐓𝐱\mathbf{T_x}: total person-years lived above age xx.

𝐧𝐝𝐱\mathbf{_nd_x}: proportion of the cohort dying between ages xx and x+nx+n. Shorthand to dxd_x.

𝐧𝐩𝐱\mathbf{_np_x}: probability of survival between ages xx and x+nx+n conditional on survival to age xx. Inverse of qxqx.

Representing life tables graphically

We often reduce life tables to age patterns of log probability of death (log(qx)\text{log}(q_x)) or to survival curves (lxl_x over age), which can be easily displayed and vetted in plots.

Calculations and relationships between life table metrics

The demCore package includes many utility functions for calculations that leverage the mathematical relationships between life table metrics to build out a complete life table. This section will provide details and examples regarding the use of these functions and their underlying methods. We will accomplish this by following along the example of building the example life table above from death counts and population.

Note that this document and this package do not contain an exhaustive list of relationships between metrics. Additionally, some equations presented rely on assumptions and others are true relationships that are always valid. For more details, see the Preston Demography textbook, from which many of these details were drawn.

mx

From raw death count and population data, the place to start with a life table is mxm_x.

mx=deathsperson-yearsobserved deathsmid-interval populationm_x = \frac{\text{deaths}}{\text{person-years}} \approx \frac{\text{observed deaths}}{\text{mid-interval population}}

Let’s load in our example data and calculate mxm_x:

data("austria_1992_lt")
dt <- austria_1992_lt[, c("age_start", "age_end", "deaths", "pop")]
dt[, mx := deaths / pop]

ax

If we have mxm_x and qxq_x we can directly calculate axa_x. However, we often use mxm_x and axa_x to get qxq_x in the first place, and so have to make some assumptions to get axa_x. Empirical calculations of axa_x would require detailed and accurate data on age of death in days (such as paired date of birth and date of death), which is typically unavailable.

Rule of thumb:

One option is to assume all deaths occur in the middle of the interval, so axn/2a_x \approx n/2. This assumption works well for most ages, but it doesn’t work as well for very young or very old where mortality can change rapidly over the interval.

Another assumption we can make is that the age-specific death rate is constant between xx and x+nx+n. Under this assumption, nax=n+1nmxn1ennmx._na_x = n + \frac{1}{_nm_x} - \frac{n}{1- e^{-n \cdot {_nm_x}}}. The function mx_to_ax implements this assumption.

Using our example data, we get:

dt <- hierarchyUtils::gen_length(dt, col_stem = "age")
dt[, ax := mx_to_ax(mx = mx, age_length = age_length)]

Note that we can use hierarchyUtils::gen_length to add the age_length column given age_start and age_end.

1a0 and 4a1:

Preston et al adapted an analysis first completed by Coale and Demeny (1983) to derive a relationship between infant mortality rate (1m0_1m_0) and under-5 axa_x values (1a0_1a_0 and 4a1_4a_1). In the absence of reliable data to produce axa_x, these relationships can be used to predict axa_x from infant mxm_x:

Males Females
1a0:
If 1m0 >= 0.107 0.330 0.350
If 1m0 < 0.107 0.045 + 2.684 * 1m0 0.053 + 2.800 * 1m0
4a1:
If 1m0 >= 0.107 1.352 1.361
If 1m0 < 0.107 1.651 - 2.816 * 1m0 1.522 - 1.518 * 1m0

Use the gen_u5_ax_from_mx function to implement this method:

dt[, sex := "male"]
gen_u5_ax_from_mx(dt, id_cols = c("age_start", "age_end", "sex"))

Graduation method: One strategy for selecting axa_x values is based on the level and slope of the nmx_nm_x function. Comparing two populations with the same 5m60_5m_60, the population with more rapidly rising mortality rate with respect to age will have deaths that are more concentrated in the later part of the interval (higher axa_x). Comparing two populations with the same slope in mxm_x, the one with higher mortality rate will have more deaths at the beginning of the interval (lower axa_x).

To utilize this theory, we can implement iteration as described in the Preston book, and originally proposed by Keyfitz (1966): nax=n24ndxn+n2ndx+n24ndx+nndx_na_x = \frac{\frac{-n}{24} {_nd_{x-n}} + \frac{n}{2} {_nd_x} + \frac{n}{24} {_nd_{x+n}}}{_nd_x} Where dxd_x is derived from the conversion from mxm_x to qxq_x. However, since the mxm_x to qxq_x conversion requires axa_x, this requires us to pick a starting place for axa_x (like n/2n/2), solve for dxd_x, solve for axa_x, and so on until convergence. Use demCore::iterate_ax to implement this method.

qx

From mxm_x and axa_x, we can solve directly for qxq_x:

nqx=nnmx1+(nnax)nmx_nq_x = \frac{n \cdot {_nm_x}}{1 + (n - {_na_x}) \cdot {_nm_x}} For the terminal age group, qxq_x should be 11 because all individuals surviving to the terminal age group will die in that age group (probability of death = 11).

The mx_ax_to_qx combines the equation for qxq_x and the requirement that terminal qxq_x equal one by setting qx=1q_x = 1 if age_length = Inf.

dt[, qx := mx_ax_to_qx(mx = mx, ax = ax, age_length = age_length)]

Other functions that utilize this relationship but solve for different metrics are mx_qx_to_ax and qx_ax_to_mx.

You can also solve for qxq_x under the assumption of constant mortality rate within an interval, which removes axa_x from the relationship:

nqx=1ennmx._nq_x = 1 - e^{-n \cdot {_nm_x}}.

dt[, qx_compare := mx_to_qx(mx = mx, age_length = age_length)]

These two qxq_x values are the same, because the implied axa_x in mx_to_qx is equivalent to the axa_x we generate under the assumption in mx_to_ax.

lx

To calculate the proportion of a cohort surviving to age xx (lxl_x), we set l0=1l_0 = 1 (100% survive to birth), and recursively calculate:

lx+n=lx(1nqx)l_{x+n} = l_x \cdot (1 - _nq_x)

or in words, the proportion surviving to age xx times the proportion of those survivors who do not die between xx and x+nx+n is the proportion surviving to age x+nx+n.

Our gen_lx_from_qx function can perform this calculation:

gen_lx_from_qx(dt, id_cols = c("age_start", "age_end"))

dx

Proportion of cohort dying between ages xx and x+nx+n (dxd_x) is nq0_nq_0 to start, then ndx=lxlx+n_nd_x = l_x - l_{x+n} thereafter (difference between proportion surviving to age xx and proportion surviving to age x+nx+n).

To calculate dxd_x, use gen_dx_from_lx:

gen_dx_from_lx(dt, id_cols = c("age_start", "age_end"))

nLx

The person-years lived between ages xx and x+nx+n (nLx_nL_x) can be broken down into:

  • Person-years lived by those who survive the interval = nlx+nn \cdot l_{x+n}
  • Person-years lived by those who die during the interval = naxndx_na_x \cdot {_nd_x}

such that:

nLx=nlx+n+naxndx._nL_x = n \cdot l_{x+n} + _na_x \cdot {_nd_x}.

For the terminal age group:

Lx=person-years lived above age x=person-years lived above age xdeaths over age xdeaths over age x=lxmx{_{\infty}L_x} = \text{person-years lived above age } x = \frac{\text{person-years lived above age }x}{\text{deaths over age }x} \cdot \text{deaths over age }x= \frac{l_x}{_{\infty}m_x}

Use the gen_nLx function to calculate with this method:

gen_nLx(dt, id_cols = c("age_start", "age_end"))

Tx

Next, use nLx_nL_x to get TxT_x:

Tx=xnLx.T_x = \sum_{x}^{\infty} {_nL_x}.

gen_Tx(dt, id_cols = c("age_start", "age_end"))

ex

Life expectancy above age xx (mean person-years lived above age xx) is equal to the total person years over age xx divided by the persons surviving to age xx:

ex=Txlx.e_x = \frac{T_x}{lx}.

For the terminal age group, ax=exa_x = e_x because everyone surviving to the interval dies in the interval.

Calculate exe_x with gen_ex:

gen_ex(dt)

Summary

One possible set of steps for calculating a complete period life table from deaths and mid-year population is:

  1. Compute mx = deaths / population
  2. Estimate <5 ax from mx using gen_u5_ax_from_mx, and set ax over age 5 as n/2
  3. Calculate qx directly using mx_ax_to_qx
  4. Use ax iteration with iterate_ax to modify ax and qx values, improving ax over the naive n/2 values
  5. Calculate px, lx, dx, nLx, Tx, and ex directly, in that order, from mx, ax, and qx, with lifetable function. The lifetable function combines many of the functions described in this vignette for convenience.
  6. In the terminal age group, ax is equal to ex. So, once ex is solved for, terminal ax can be replaced with that value.

References

Preston Samuel H, Patrick H, Michel G. Demography: measuring and modeling population processes. MA: Blackwell Publishing. 2001.

Coale AJ, Demeny P, Vaughan B. Regional model life tables and stable populations: studies in population. Elsevier; 2013 Oct 22.

Keyfitz N. A life table that agrees with the data. Journal of the American Statistical Association. 1966 Jun 1;61(314):305-12.