R/prioritize_dt.R
prioritize_dt.Rd
Rank non-unique rows in a data.table using defined priority orders
prioritize_dt( dt, rank_by_cols, unique_id_cols = rank_by_cols, rank_order, warn_missing_levels = FALSE, warn_non_unique_priority = FALSE, check_top_priority_unique_only = FALSE )
dt | [ |
---|---|
rank_by_cols | [ |
unique_id_cols | [ |
rank_order | [ |
warn_missing_levels | [ |
warn_non_unique_priority | [ |
check_top_priority_unique_only | [ |
dt
with a new 'priority' column generated using the rules specified
in rank_order
. 'priority' equal to 1 is the highest priority
prioritize_dt
uses data.table::setorderv
to order dt
according to
rank_order
. prioritize_dt
takes three possible values to specify the
order of a column in dt
.
'1', order a numeric column in ascending order (smaller values have higher priority).
'-1', order a numeric column in descending order (larger values have higher priority).
factor
levels, to order a categorical column in a custom order with the
first level having highest priority. When not all present values of the
column are defined in the levels, the priority will be NA and a warning
printed if quiet = FALSE
.
The order of elements in rank_order
matters. The more important rules
should be placed earlier in rank_order
so that they are applied first.
# preliminary data with only total population dt_total <- data.table::CJ( location = "USA", year = 2000, age_start = 0, age_end = Inf, method = c("de facto", "de jure"), status = c("preliminary") ) # final data in 10 year age groups dt_10_yr_groups <- data.table::CJ( location = "USA", year = 2000, age_start = seq(0, 80, 10), method = c("de facto", "de jure"), status = c("final") ) dt_10_yr_groups[, age_end := age_start + 10]#> location year age_start method status age_end #> 1: USA 2000 0 de facto final 10 #> 2: USA 2000 0 de jure final 10 #> 3: USA 2000 10 de facto final 20 #> 4: USA 2000 10 de jure final 20 #> 5: USA 2000 20 de facto final 30 #> 6: USA 2000 20 de jure final 30 #> 7: USA 2000 30 de facto final 40 #> 8: USA 2000 30 de jure final 40 #> 9: USA 2000 40 de facto final 50 #> 10: USA 2000 40 de jure final 50 #> 11: USA 2000 50 de facto final 60 #> 12: USA 2000 50 de jure final 60 #> 13: USA 2000 60 de facto final 70 #> 14: USA 2000 60 de jure final 70 #> 15: USA 2000 70 de facto final 80 #> 16: USA 2000 70 de jure final 80 #> 17: USA 2000 80 de facto final 90 #> 18: USA 2000 80 de jure final 90dt_10_yr_groups[age_start == 80, age_end := Inf]#> location year age_start method status age_end #> 1: USA 2000 0 de facto final 10 #> 2: USA 2000 0 de jure final 10 #> 3: USA 2000 10 de facto final 20 #> 4: USA 2000 10 de jure final 20 #> 5: USA 2000 20 de facto final 30 #> 6: USA 2000 20 de jure final 30 #> 7: USA 2000 30 de facto final 40 #> 8: USA 2000 30 de jure final 40 #> 9: USA 2000 40 de facto final 50 #> 10: USA 2000 40 de jure final 50 #> 11: USA 2000 50 de facto final 60 #> 12: USA 2000 50 de jure final 60 #> 13: USA 2000 60 de facto final 70 #> 14: USA 2000 60 de jure final 70 #> 15: USA 2000 70 de facto final 80 #> 16: USA 2000 70 de jure final 80 #> 17: USA 2000 80 de facto final Inf #> 18: USA 2000 80 de jure final Infinput_dt <- rbind(dt_total, dt_10_yr_groups) input_dt[, n_age_groups := .N, by = setdiff(names(input_dt), c("age_start", "age_end"))]#> location year age_start age_end method status n_age_groups #> 1: USA 2000 0 Inf de facto preliminary 1 #> 2: USA 2000 0 Inf de jure preliminary 1 #> 3: USA 2000 0 10 de facto final 9 #> 4: USA 2000 0 10 de jure final 9 #> 5: USA 2000 10 20 de facto final 9 #> 6: USA 2000 10 20 de jure final 9 #> 7: USA 2000 20 30 de facto final 9 #> 8: USA 2000 20 30 de jure final 9 #> 9: USA 2000 30 40 de facto final 9 #> 10: USA 2000 30 40 de jure final 9 #> 11: USA 2000 40 50 de facto final 9 #> 12: USA 2000 40 50 de jure final 9 #> 13: USA 2000 50 60 de facto final 9 #> 14: USA 2000 50 60 de jure final 9 #> 15: USA 2000 60 70 de facto final 9 #> 16: USA 2000 60 70 de jure final 9 #> 17: USA 2000 70 80 de facto final 9 #> 18: USA 2000 70 80 de jure final 9 #> 19: USA 2000 80 Inf de facto final 9 #> 20: USA 2000 80 Inf de jure final 9output_dt <- prioritize_dt( dt = input_dt, rank_by_cols = c("location", "year"), unique_id_cols = c("location", "year", "age_start", "age_end"), rank_order = list( method = c("de facto", "de jure"), # prioritize 'de facto' sources highest n_age_groups = -1 # prioritize sources with more age groups ) )