
04: Initializing Your Model
RSTr-init.Rmd
Overview
The initialize_model()
function builds the necessary
features needed to run RSTr
and places them in R’s
temporary directory. In this vignette, we will talk in detail about each
argument and how to use them.
Arguments of the initialize_model()
function
initialize_model()
provides several arguments:
name
: The name of the folder your model information lives in;dir
: The directory where the model folder lives. By default, this saves into your temporary directory, so the model information will be lost after the R session ends. Should you want to save your model to be analyzed at a later date or ensure that your samples are intact if R crashes during runtime, specify a different directory;data
: Thelist
object containing the eventY
and populationn
data. For more information ondata
setup, readvignette("RSTr-event")
;adjacency
: The adjacency structure for your event and population data. For more information on adjacency structure setup, readvignette("RSTr-adj")
;inits
: This is alist
of initial values for each parameter. This can be specified by the user or generated by default;priors
: This is alist
of all prior information for each parameter. This can be specified by the user or generated by default;model
: This is astring
that specifies the model to use. Available choices are"ucar"
for the Univariate CAR model,"mcar"
for the Multivariate CAR model, and"mstcar"
for the Multivariate Spatiotemporal CAR model. By default,"mstcar"
is chosen;method
: Chooses whether the event data is either Binomial or Poisson distributed. By default,RSTr
uses Binomial updates for the event data;m0
: In restricted models, specifies the baseline neighbor count;A
: In restricted models, describes the intensity of the smoothing between regions;rho_up
: Allows for updates of the temporal correlation parameterrho
. By default,RSTr
does not updaterho
;impute_lb
: Specifies a lower bound for imputed data for event information that is missing or suppressed;impute_ub
: Specifies an upper bound for imputed data for event information that is missing or suppressed; andseed
: Allows the user to specify the random seed used for replication purposes.
Most of these arguments are not needed, as the model has defaults for
most arguments. If you run into errors when trying to initialize your
model, read vignette("RSTr-troubleshoot")
. Below, we will
go into detail regarding what each argument does specifically and what
to keep in mind when setting these values.
The inits
argument
inits
is a list
specifying the starting
values for each parameter of the model. Each parameter can have an
initial value specified. Here are the possible initial value parameters
for the MSTCAR model:
theta
: The estimated spatially smoothed rate for each region-group-time, transformed to a(-∞, ∞)
scale.theta
is anarray
of real numbers with dimensionsnum_region x num_group x num_time
. Note that to facilitate the Metropolis update done bytheta
, all values are either logit- or log-transformed, depending on whichmethod
you choose, so be sure to uselog()
orlogit()
to transform your data accordingly.method = "binom"
is associated withlogit()
andmethod = "pois"
is associated withlog()
;beta
: The mean rate for each island-group-year on the transformed scale. Islands are sets of regions that contain a set of neighbors that are all related to each other. For example, inmiadj
, there are two islands that represent the counties of the Upper Peninsula and the Lower Peninsula. These islands don’t touch each other, and therefore don’t share adjacency information. Therefore, each island is assigned its ownbeta
.beta
is anarray
of real numbers with dimensionsnum_island x num_group x num_time
. Note that this is also logit- or log-transformed, similar totheta
;Z
: The spatiotemporal random effects. These are the parameters that induce smoothing on the counties, with the intensity of the smoothing dictated by the spatial covariance matricesG
.Z
is anarray
of real numbers with dimensionsnum_region x num_group x num_time
;G
: The spatial covariance matrices. This parameter determines the intensity of the spatial smoothing performed byZ
and represents the strength of the relationship between each group in a given time period.G
is anarray
of temporally-evolving positive-definite symmetric matrices with dimensionsnum_group x num_group x num_time
;rho
: The temporal correlation. This parameter decides the strength of the relationship between values in time periodt
to values in time periodt-1
. It is avector
of lengthnum_group
of real numbers with support[0,1]
;tau2
: The non-spatial variance. This parameter picks up any variance in values oftheta
for each group. It is avector
of lengthnum_group
of positive real numbers; andAg
: The general spatial covariance matrix. This parameter describes the overall relationship between groups across the entire model and is used in the prior distribution for the matrices inG
.Ag
is a positive-definite symmetric matrix with dimensionsnum_group x num_group
.
Note that you don’t have to specify inits for all parameters
if you only want to specify some of them - any undefined inits will be
defined by the default values. For example, you can specify only the
prior for the theta
values and all other values will be
generated on their own. However, if one value is specified for a certain
parameter in inits
, all values must be specified for that
parameter in inits
: you cannot, for example, define priors
for just one year of theta
. Finally, any values included in
your inits
list that aren’t aligned with the above names
will be ignored.
The priors
argument
priors
behaves similar to inits
, except
that it contains all information related to parameter priors. The
following are all priors used in the MSTCAR model:
Ag_scale
andAg_df
: These are the scale and degrees of freedom priors used with Wishart-distributed random variableAg
.Ag_scale
is a positive-definite symmetric matrix andAg_df
is adouble
of at least sizenum_group
;G_scale
andG_df
: These are the scale and degrees of freedom priors used with Inverse-Wishart distributed matrix slices of random variableG
.G_scale
is a positive-definite symmetric matrix andG_df
is adouble
of at least sizenum_group
;tau_a
andtau_b
: These are the rate and scale priors used with Inverse-Gamma distributed random variabletau2
.tau_a
andtau_b
must both be positive real numbers;rho_a
andrho_b
: These are the shape priors used with Beta-distributed random variablerho
.rho_a
andrho_b
must both be positive real numbers;theta_sd
: An array of positive real numbers describing the candidate standard deviation in the Metropolis update for the estimated ratestheta
. These values will be adaptively updated at the start of each batch; andrho_sd
: A vector of positive real numbers describing the candidate standard deviation in the Metropolis update for the temporal correlationrho
. These values will be adaptively updated at the start of each batch. Note that this is only used ifrho_up = TRUE
.
By default, most of these priors are relatively non-informative.
Similar to inits
, you don’t have to specify priors for
all parameters if you only want to specify some of them - any
undefined priors will be defined by the default values. Any values
included in priors
that aren’t aligned with the above names
will be ignored.
The method
argument
method
offers two values: "binom"
and
"pois"
. These values determine how the data is transformed
and how the theta
Metropolis update is performed:
"binom"
treats the event data as Binomial-distributed and
"pois"
treats the event data as Poisson-distributed.
Depending on your use case, you’ll want to choose between the two: for
example, if you are working with very small mortality rates,
"pois"
will work well, but if you are working with birth
rates, for example, then "binom"
will work better. Note
that "binom"
works in most general use cases and
"pois"
only works well for datasets with small rates under
approximately 1%.
m0
and A
m0
and A
are two components that determine
the intensity of the smoothing of CAR models. m0
should be
a positive scalar, and the size of A
is dependent on the
group/time structure of your data: A
will be a positive
scalar for UCAR models, a vector
of size
num_group
for MCAR models, and a matrix of size
num_group
x num_time
for MSTCAR models. Note,
however, that these informativeness restriction measures are currently
only developed for the UCAR model, and restrictions for more complex
models will be added to the RSTr
package as their
respective methods are developed.
The rho_up
argument
A logical
that specifies whether to calculate estimates
for the temporal correlation rho
. By default, it is set to
FALSE
. In empirical testing, this estimate was found to not
be very sensitive to changes when specified prudently and also increases
runtime by an order of magnitude due to its complexity.
The seed
argument
Because of the stochastic nature of Bayesian inference and the
inherent instability of the MSTCAR model, replicability is extremely
important. seed
allows the user to specify a seed for
generating similar estimates, set by default to 1234
.
The .ignore_checks
argument
As development continues on RSTr
, there are occasions
where the checks performed on the inputs of
initialize_model()
throw an error, even though you may be
certain that all of your inputs are behaving as expected. To override
the checks, you can use the .ignore_checks
argument. By
default, this is marked as FALSE
, but specifying
TRUE
will skip this step.
Closing Thoughts
Initialization is one of the most important steps of running the
model, as it’s where virtually all choices regarding the model are made.
In this vignette, we explored each argument of the
initialize_model()
function and how to appropriately choose
values for each argument.