04: Initializing Your Model

Overview

The initialize_model() function builds the necessary features needed to run RSTr and places them in R’s temporary directory. In this vignette, we will talk in detail about each argument and how to use them.

Arguments of the `initialize_model()` function

initialize_model() provides several arguments:

name: The name of the folder your model information lives in;
dir: The directory where the model folder lives. By default, this saves into your temporary directory, so the model information will be lost after the R session ends. Should you want to save your model to be analyzed at a later date or ensure that your samples are intact if R crashes during runtime, specify a different directory;
data: The list object containing the event Y and population n data. For more information on data setup, read vignette("RSTr-event");
adjacency: The adjacency structure for your event and population data. For more information on adjacency structure setup, read vignette("RSTr-adj");
inits: This is a list of initial values for each parameter. This can be specified by the user or generated by default;
priors: This is a list of all prior information for each parameter. This can be specified by the user or generated by default;
model: This is a string that specifies the model to use. Available choices are "ucar" for the Univariate CAR model, "mcar" for the Multivariate CAR model, and "mstcar" for the Multivariate Spatiotemporal CAR model. By default, "mstcar" is chosen;
method: Chooses whether the event data is either Binomial or Poisson distributed. By default, RSTr uses Binomial updates for the event data;
m0: In restricted models, specifies the baseline neighbor count;
A: In restricted models, describes the intensity of the smoothing between regions;
rho_up: Allows for updates of the temporal correlation parameter rho. By default, RSTr does not update rho;
impute_lb: Specifies a lower bound for imputed data for event information that is missing or suppressed;
impute_ub: Specifies an upper bound for imputed data for event information that is missing or suppressed; and
seed: Allows the user to specify the random seed used for replication purposes.

Most of these arguments are not needed, as the model has defaults for most arguments. If you run into errors when trying to initialize your model, read vignette("RSTr-troubleshoot"). Below, we will go into detail regarding what each argument does specifically and what to keep in mind when setting these values.

The `inits` argument

inits is a list specifying the starting values for each parameter of the model. Each parameter can have an initial value specified. Here are the possible initial value parameters for the MSTCAR model:

theta: The estimated spatially smoothed rate for each region-group-time, transformed to a (-∞, ∞) scale. theta is an array of real numbers with dimensions num_region x num_group x num_time. Note that to facilitate the Metropolis update done by theta, all values are either logit- or log-transformed, depending on which method you choose, so be sure to use log() or logit() to transform your data accordingly. method = "binom" is associated with logit() and method = "pois" is associated with log();
beta: The mean rate for each island-group-year on the transformed scale. Islands are sets of regions that contain a set of neighbors that are all related to each other. For example, in miadj, there are two islands that represent the counties of the Upper Peninsula and the Lower Peninsula. These islands don’t touch each other, and therefore don’t share adjacency information. Therefore, each island is assigned its own beta. beta is an array of real numbers with dimensions num_island x num_group x num_time. Note that this is also logit- or log-transformed, similar to theta;
Z: The spatiotemporal random effects. These are the parameters that induce smoothing on the counties, with the intensity of the smoothing dictated by the spatial covariance matrices G. Z is an array of real numbers with dimensions num_region x num_group x num_time;
G: The spatial covariance matrices. This parameter determines the intensity of the spatial smoothing performed by Z and represents the strength of the relationship between each group in a given time period. G is an array of temporally-evolving positive-definite symmetric matrices with dimensions num_group x num_group x num_time;
rho: The temporal correlation. This parameter decides the strength of the relationship between values in time period t to values in time period t-1. It is a vector of length num_group of real numbers with support [0,1];
tau2: The non-spatial variance. This parameter picks up any variance in values of theta for each group. It is a vector of length num_group of positive real numbers; and
Ag: The general spatial covariance matrix. This parameter describes the overall relationship between groups across the entire model and is used in the prior distribution for the matrices in G. Ag is a positive-definite symmetric matrix with dimensions num_group x num_group.

Note that you don’t have to specify inits for all parameters if you only want to specify some of them - any undefined inits will be defined by the default values. For example, you can specify only the prior for the theta values and all other values will be generated on their own. However, if one value is specified for a certain parameter in inits, all values must be specified for that parameter in inits: you cannot, for example, define priors for just one year of theta. Finally, any values included in your inits list that aren’t aligned with the above names will be ignored.

The `priors` argument

priors behaves similar to inits, except that it contains all information related to parameter priors. The following are all priors used in the MSTCAR model:

Ag_scale and Ag_df: These are the scale and degrees of freedom priors used with Wishart-distributed random variable Ag. Ag_scale is a positive-definite symmetric matrix and Ag_df is a double of at least size num_group;
G_scale and G_df: These are the scale and degrees of freedom priors used with Inverse-Wishart distributed matrix slices of random variable G. G_scale is a positive-definite symmetric matrix and G_df is a double of at least size num_group;
tau_a and tau_b: These are the rate and scale priors used with Inverse-Gamma distributed random variable tau2. tau_a and tau_b must both be positive real numbers;
rho_a and rho_b: These are the shape priors used with Beta-distributed random variable rho. rho_a and rho_b must both be positive real numbers;
theta_sd: An array of positive real numbers describing the candidate standard deviation in the Metropolis update for the estimated rates theta. These values will be adaptively updated at the start of each batch; and
rho_sd: A vector of positive real numbers describing the candidate standard deviation in the Metropolis update for the temporal correlation rho. These values will be adaptively updated at the start of each batch. Note that this is only used if rho_up = TRUE.

By default, most of these priors are relatively non-informative. Similar to inits, you don’t have to specify priors for all parameters if you only want to specify some of them - any undefined priors will be defined by the default values. Any values included in priors that aren’t aligned with the above names will be ignored.

The `method` argument

method offers two values: "binom" and "pois". These values determine how the data is transformed and how the theta Metropolis update is performed: "binom" treats the event data as Binomial-distributed and "pois" treats the event data as Poisson-distributed. Depending on your use case, you’ll want to choose between the two: for example, if you are working with very small mortality rates, "pois" will work well, but if you are working with birth rates, for example, then "binom" will work better. Note that "binom" works in most general use cases and "pois" only works well for datasets with small rates under approximately 1%.

`m0` and `A`

m0 and A are two components that determine the intensity of the smoothing of CAR models. m0 should be a positive scalar, and the size of A is dependent on the group/time structure of your data: A will be a positive scalar for UCAR models, a vector of size num_group for MCAR models, and a matrix of size num_group x num_time for MSTCAR models. Note, however, that these informativeness restriction measures are currently only developed for the UCAR model, and restrictions for more complex models will be added to the RSTr package as their respective methods are developed.

The `rho_up` argument

A logical that specifies whether to calculate estimates for the temporal correlation rho. By default, it is set to FALSE. In empirical testing, this estimate was found to not be very sensitive to changes when specified prudently and also increases runtime by an order of magnitude due to its complexity.

The `seed` argument

Because of the stochastic nature of Bayesian inference and the inherent instability of the MSTCAR model, replicability is extremely important. seed allows the user to specify a seed for generating similar estimates, set by default to 1234.

The `.ignore_checks` argument

As development continues on RSTr, there are occasions where the checks performed on the inputs of initialize_model() throw an error, even though you may be certain that all of your inputs are behaving as expected. To override the checks, you can use the .ignore_checks argument. By default, this is marked as FALSE, but specifying TRUE will skip this step.

Closing Thoughts

Initialization is one of the most important steps of running the model, as it’s where virtually all choices regarding the model are made. In this vignette, we explored each argument of the initialize_model() function and how to appropriately choose values for each argument.

Overview

Arguments of the initialize_model() function

The inits argument

The priors argument

The method argument

m0 and A

The rho_up argument

The seed argument

The .ignore_checks argument