vignettes/agg_scale_performance.Rmd
agg_scale_performance.Rmd
Aggregations and scaling with hierarchyUtils
should
perform as fast as basic data.table code. For basic use cases there
should only be slightly more overhead due to the assertions and added
flexibility included in hierarchyUtils
.
This vignette makes aggregation timing comparisons between
hierarchyUtils
and data.table
for example
input data when aggregating across an interval variable like ‘age’ or a
categorical variable like ‘sex’. The example input data increases in
size as more draws are added.
#> Key: <location, year_start, sex, age_start, value1, value2, draw>
#> location year_start sex age_start value1 value2 draw
#> <num> <num> <char> <num> <num> <num> <int>
#> 1: 1 1950 female 0 1 1 1
#> 2: 1 1950 female 1 1 1 1
#> 3: 1 1950 female 2 1 1 1
#> 4: 1 1950 female 3 1 1 1
#> 5: 1 1950 female 4 1 1 1
#> ---
#> 13628: 1 2020 male 91 1 1 1
#> 13629: 1 2020 male 92 1 1 1
#> 13630: 1 2020 male 93 1 1 1
#> 13631: 1 2020 male 94 1 1 1
#> 13632: 1 2020 male 95 1 1 1
col_stem | col_type | n_draws | method | n_input_rows | user.self | sys.self | elapsed |
---|---|---|---|---|---|---|---|
age | interval | 1 | data.table | 13,632 | 0.063 | 0.007 | 0.036 |
age | interval | 1 | hierarchyUtils | 13,632 | 1.677 | 0.007 | 1.619 |
age | interval | 10 | data.table | 136,320 | 0.152 | 0.000 | 0.076 |
age | interval | 10 | hierarchyUtils | 136,320 | 2.540 | 0.036 | 2.010 |
age | interval | 100 | data.table | 1,363,200 | 0.776 | 0.004 | 0.402 |
age | interval | 100 | hierarchyUtils | 1,363,200 | 9.559 | 0.171 | 7.472 |
sex | categorical | 1 | data.table | 13,632 | 0.013 | 0.000 | 0.006 |
sex | categorical | 1 | hierarchyUtils | 13,632 | 0.099 | 0.001 | 0.056 |
sex | categorical | 10 | data.table | 136,320 | 0.071 | 0.000 | 0.036 |
sex | categorical | 10 | hierarchyUtils | 136,320 | 0.643 | 0.000 | 0.481 |
sex | categorical | 100 | data.table | 1,363,200 | 0.695 | 0.000 | 0.366 |
sex | categorical | 100 | hierarchyUtils | 1,363,200 | 5.235 | 0.020 | 4.096 |