
06: Calculating Parameter Medians
RSTr-medians.Rmd
Overview
In the previous vignette, we finally ran our model using
run_sampler()
, imported estimates using
load_samples()
, age-standardized and year-aggregated our
estimates, and finally did some cursory exploration of large- and
small-population counties in our dataset. Now, we can calculate our
estimates, investigate measures of reliability, and investigate the
implications of our measures of reliability. Note that this process is
identical for all types of models.
The get_medians()
function
Previously, we generated age-standardized estimates for
theta
based on our example Michigan dataset,
theta
.
# from vignette("RSTr-samples")
library(RSTr)
initialize_model(name = "my_test_model", data = miheart, adjacency = miadj)
run_sampler("my_test_model")
To get the medians, let’s reload our age-standardized data and
population array, then load our samples into
get_medians()
:
std_pop <- c(113154, 100640, 95799)
margin_age <- 2
theta <- load_samples("my_test_model") * 1e5
theta <- bind_samples(
theta,
age_standardize(theta[, 1:3, , ], std_pop, margin_age),
margin_age,
"35-64"
)
pop <- load_pop("my_test_model")
pop <- bind_samples(
pop,
aggregate_pop(pop[, 1:3, ], margin_age),
margin_age,
"35-64"
)
medians <- get_medians(theta)
From here, we can map our estimates:
library(ggplot2)
est_35_64 <- medians[, "35-64", "1979"]
ggplot(mishp) +
geom_sf(aes(fill = est_35_64)) +
labs(
title = "Smoothed Myocardial Infarction Death Rates in MI, Ages 35-64, 1979",
fill = "Deaths per 100,000"
) +
scale_fill_viridis_c() +
theme_void()
Recall, though, in the previous vignette, we discovered that some traceplots were not as stable (i.e., reliable) as others. How can we test for reliability in our estimates?
Estimate reliability
Reliability can be easily tested in CAR models using two criteria:
- Relative precision: In Quick, et al 2024, relative precision is measured as a ratio of the posterior median to its credible interval. If an estimate’s ratio is less than 1, then that estimate is considered unreliable at that level of credibility. Effectively, this means that unreliable estimates have a spread of samples larger than the value of the estimate itself; and
- Population counts: Though
RSTr
is designed to stabilize low-population regions, there is a limit to the amount of information the model can gather, and estimates from exceedingly low-population regions may be over-smoothed. Therefore, estimates from any region that fall below a specified threshold will be considered unreliable, regardless of relative precision. Use your judgment when setting this threshold depending on what kind of data you are working with: for example, 1,000 population is generally a good rule of thumb for mortality data, whereas an appropriate cutoff for birth data is closer to 100.
Let’s get some reliability metrics for our dataset. First, let’s
generate our relative precisions at 95% credibility using the
get_relative_precision()
function and then create a
logical
array that tells us which estimates are
unreliable:
rel_prec <- get_relative_precision(
output = theta,
medians = medians,
perc_ci = 0.95
)
low_rel_prec <- rel_prec < 1
Now, let’s generate a similar logical
array
for populations less than 1000 and use both of these criteria to create
a set of suppressed medians. A median will be suppressed if it meets
either of the two criteria:
low_population <- pop < 1000
medians_supp <- medians
medians_supp[low_rel_prec | low_population] <- NA
Looking at our dataset, we can see that most of our estimates meet both criteria! Certain counties, unfortunately, are too small to get reliable estimates for, but fortunately our estimates generally had high relative precision. Let’s now map our suppressed estimates again and see what changed:
est_35_64 <- medians_supp[, "35-64", "1979"]
ggplot(mishp) +
geom_sf(aes(fill = est_35_64)) +
labs(
title = "Smoothed Myocardial Infarction Death Rates in MI, Ages 35-64, 1979",
fill = "Deaths per 100,000"
) +
scale_fill_viridis_c() +
theme_void()
In this group in 1979, it seems only one region has been suppressed
(i.e., grayed out). This is because age-standardizing the samples after
running the model both helps to bolster our relative precisions and to
increase the total population in our groups, increasing values for both
suppression criteria. If we suppress our estimates in our more granular,
non-age-standardized theta
samples, we will see that, even
with a small credible interval, many more counties are suppressed:
rel_prec <- get_relative_precision(
output = theta,
medians = medians,
perc_ci = 0.5
)
low_rel_prec <- rel_prec < 1
medians_supp <- medians
medians_supp[low_rel_prec | low_population] <- NA
est_35_44 <- medians_supp[, "35-44", "1979"]
ggplot(mishp) +
geom_sf(aes(fill = est_35_44)) +
labs(
title = "Smoothed Myocardial Infarction Death Rates in MI, Ages 35-44, 1979",
subtitle = "Relative Precision based on 50% Credible Interval",
fill = "Deaths per 100,000"
) +
scale_fill_viridis_c() +
theme_void()
As we widen our credible interval, the reliability criteria will become more conservative and suppress more counties:
rel_prec <- get_relative_precision(
output = theta,
medians = medians,
perc_ci = 0.995
)
low_rel_prec <- rel_prec < 1
medians_supp <- medians
medians_supp[low_rel_prec | low_population] <- NA
est_35_44 <- medians_supp[, "35-44", "1979"]
ggplot(mishp) +
geom_sf(aes(fill = est_35_44)) +
labs(
title = "Smoothed Myocardial Infarction Death Rates in MI, Ages 35-44, 1979",
subtitle = "Relative Precision based on 99.5% Credible Interval",
fill = "Deaths per 100,000"
) +
scale_fill_viridis_c() +
theme_void()
If we continue increasing our CI width from 0.50 to 0.95 to 0.99 and 0.995, our estimates must be more precise and more counties become grayed out. It is important to find a good balance between credible interval choice and displaying of estimates, and usually, a 95% credible interval provides a happy medium and is the value that is traditionally used.
Final thoughts
In this vignette, we explored the get_medians()
function, investigated measures of reliability, and observed how
reliability measures changed which data are suppressed. This vignette
concludes the main sections on using the functions in the
RSTr
package. After reading these, you should be able to
prepare your event and adjacency data, configure your model as
necessary, and determine which estimates are reliable. If you are
interested in learning more about how the MSTCAR model itself works,
read vignette("RSTr-models")
.