STAN Code for Analyzing Intensive Longitudinal Data: Part I - Autoregressive Models

Introduction

Over the years I wrote, collected, spliced together, and “stole” (but lets call it borrowed) a lot of code for analyzing intensive longitudinal data (ILD). I was always very happy to have good example code that I could change for my own needs (the “STAN-forum” and Michael Betancourt’s “case studies” are just two examples of priceless sources for this). In the spirit of paying it forward, I thought it would be a good idea to write a series of posts in which I share by ILD related STAN model code, including accompanying explanations of what each part of the code does and some simulations to explain/show its functioning. In this series I will share and discuss my STAN model-code, I will write a separate series with R-code for data-wrangling, visualization, etc.

As this STAN-code series goes on, I’ll share all my STAN model-code, including code for multilevel Vector Autoregressive (VAR) Models (with random covariance matrices), code that can handle missing data, code for within-chain parallelization that greatly speeds up simulations, code with Cholesky decompositions that help with the stability of your code, and more. To start simple however, and to get a feel for this blogging thing I’ll start with some basic STAN-code for the humble first-order autoregressive (AR(1)) model, the workhorse of a lot of more advanced analyses methods for ILD. This model is also the model I started my journey with many years ago, so it feels like a suitable first entry.

These posts will all share a similar structure in which I first give the full model code, followed with a more chunk by chunk discussion of the model and simulations that show what the code does.

Alright! Almost ready to begin. I just wanted to add that this is the code that I use. My own personal little Frankensteins. There are undoubtedly ways to make the code more efficient and elegant, and if you would like to comment on the code and suggests improvements, please do! I’ll update the posts accordingly (and give full credit off course!).

Now let’s get started….everyone, let me introduce the AR(1) model!

The complete model code

Below is my basic STAN-code for the AR(1) model. We can vectorize some stuff to make it a bit faster, but this is the “no frills” version of the code, and it already runs fast enough anyway.


// The input data is a vector 'y' of length 'N'.
data {
  int<lower=0> N;
  vector[N] y;
}

// The parameters accepted by the model. 
parameters {
  real alpha;
  real<lower=-1, upper=1> beta;
  real<lower=0> sigma;
}

// The model to be estimated. We model the output 'y' to be normally distributed 
// with mean 'alpha + beta * y[n-1]'and standard deviation 'sigma'.
model {
  alpha ~ normal(0, 5);
  beta ~ normal(0, .7);
  sigma ~ normal(0, 2);
  
  
  for (n in 2:N)
    y[n] ~ normal(alpha + beta * y[n-1], sigma);
}

I also have code that allows you to select different lags (indicated by K in the code below) for you autoregressive model. That code looks like this:

// The input data is a vector 'y' of length 'N'.
data {
  int<lower=0> K;
  int<lower=0> N;
  real y[N];
}

// The parameters accepted by the model. 
parameters {
  real alpha;
  real<lower=-1, upper=1> beta[K];
  real<lower=0> sigma;
}

// The model to be estimated. We model the output 'y' to be normally distributed 
// with mean 'mu'and standard deviation 'sigma'.
model {
  alpha ~ normal(0, 5);
  beta ~ normal(0, .7);
  sigma ~ normal(0, 2);
  
  for (n in (K+1):N) {
    real mu = alpha;
    for (k in 1:K)
      mu += beta[k] * y[n-k];
      y[n] ~ normal(mu, sigma);
  }
}

Now, you can either copy and paste the code and be off, or read on below for some explanation of the code.

But what does it all mean?!

The Data Block and Data-generation

STAN code is very (veeeery) structured. This takes some getting used to, but I ended up liking it (although I still curse it from time to time). For a very thorough discussion of the STAN language and block structure check out this amazing “intro” by Michael Betancourt. I’ll only describe the block and elements that are actually in this model. First up, the data block.


// The input data is a vector 'y' of length 'N'.
data {
  int<lower=0> N;
  vector[N] y;
}

As you probably guessed, the data block is where you specify the variables (and their characteristics) that you specified outside of STAN, and that you are using as input to your model. Here, I specify the number of total observations N, and the vector with all N observations of my outcome variable (y). Note that in the data block I explicitly mention that N is an integer and y is a vector. As I said, the STAN model code is very structured and you have to specify the type of all your data (and parameters). I also specify a lower bound for the value of N (the number of observations can’t be smaller than 0), to prevent me from accidentally providing weird data to the model. If you want to generate data for this model in R, I would do that (before calling the STAN-model) by running the code below:

N = 500

y <- arima.sim(list(order=c(1,0,0), ar=.5), n=N)

# If you want a mean different than 0, 5 for example, run: y <- y + 5  

In the model that allows you to select different lags I would also have to specify the lag (K) I want to use in my model in R before running the STAN-code. For an AR(1) model my pre-STAN data generation would look like:

N = 500

y <- arima.sim(list(order=c(1,0,0), ar=.5), n=N)

# If you want a mean different than 0, 5 for example, run: y <- y + 5  

K <- 1

The Parameters Block

Next is the parameters block. Here you specify all the parameters of your model, here an intercept (alpha), an AR-parameter (beta), and the standard deviation of the residuals (sigma). As for the data, you have to specify the type for your parameters (here the type of all parameters is real, since they are all continuous variables). I also specify a lower bound for my standard deviation (sigma) and an upper and lower bound for my AR-parameter (beta). This last constraint reflects that I want my model to be stationary which requires the AR-parameter to be between -1 and 1. Draws that fall outside of this range will be rejected.


// The parameters accepted by the model. 
parameters {
  real alpha;
  real<lower=-1, upper=1> beta;
  real<lower=0> sigma;
}

The parameters block for the model in which you can choose the lag is almost the same, but now you would have to specify an AR parameter for each lag in your model (i.e., 1 parameter for an AR(1) model, 2 parameters for an AR(2) model, etc). To that end, beta is not a real valued scalar in that model but a vector containing a number of values that is equal to the lag (K).

parameters {
  real alpha;
  real<lower=-1, upper=1> beta[K];
  real<lower=0> sigma;
}

The Model Block

Finally, the model block. This is where you specify your full Bayesian model; the priors for your parameters and the observational model for your data. I’m using weakly informative normal priors for my intercept and AR-parameter, and a weakly informative half-normal prior for my SD. The specific values for these priors are chose based on the fact that many measures in social sciences use 5 or 7-point Likert-scales. You could also specify priors for a standardized scale and standardize your variables as part of the model, that way you don’t have to change the priors for each new data set. I’ll show this approach in an upcoming post. Note that while I say I use a half-normal prior on sigma, I actually put a normal distribution on this parameter. However, since I specified sigma should be larger than 0 in the parameters block, STAN turns this into a half-normal distribution (did I already mention STAN is awesome?!). For my outcome variable y, I specify the standard AR(1) model in which each observation is regressed on the immediate prior observation using the AR-parameter (beta). Note that I can’t use the first observation on y, as there is no previous observation that I can use as a predictor for the first observation.


// The model to be estimated. We model the output 'y' to be normally distributed 
// with mean 'alpha + beta * y[n-1]'and standard deviation 'sigma'.
model {
  alpha ~ normal(0, 5);
  beta ~ normal(0, .7);
  sigma ~ normal(0, 2);
  
  
  for (n in 2:N)
    y[n] ~ normal(alpha + beta * y[n-1], sigma);
}

For the model in which you can specify the lag you want, the basics are the same but I include an additional loop so all the lags I want are fitted to the model.

// The model to be estimated. We model the output 'y' to be normally distributed 
// with mean 'mu'and standard deviation 'sigma'.
model {
  alpha ~ normal(0, 5);
  beta ~ normal(0, .7);
  sigma ~ normal(0, 2);
  
  for (n in (K+1):N) {
    real mu = alpha;
    
    for (k in 1:K)
      mu += beta[k] * y[n-k];
      y[n] ~ normal(mu, sigma);
  }
}

Basically, I cut my observational model up into two parts. First I specify an empty model for y, that just contains the intercept alpha (this is what I do in the first line within the first for-loop). In this first for-loop I specify a new “ghost” parameter, mu, that is real valued. It’s a “ghost” parameter in the sense that it is not really a model parameter, and I only use it to conveniently add a not previously determined number of lags to my observational model (my likelihood) of y. It’s function will become apparent shortly. Also note that I have to ignore the first K observations. If I use a lag of 2, I can’t use the first two observations in my data since these don’t have two prior observations that can be used a predictors. In a separate second step I add the part of my observational model that predicts an observation using previous scores. This is what is happening in the nested for-loop. Notice that I can neatly cut a likelihood up into parts using the “+=” operator. That operator add the expression to the right of it to the likelihood that was already specified for the parameter on the left-side of the operator. This is the way my “ghost” parameter mu is used! I want to add the entire expression of the AR model to the likelihood of y, but since the number of lagged terms is not predetermined, I could not simple write out the code in the STAN-file. Instead, I add the intercept to the likelihood first (through the “ghost” parameter mu), and then add as much lagged terms to the likelihood as needed/specified by looping over the code. The “ghost” parameter mu can subsequently be specified as the mean of the normal density of y, and it will contain all relevant lagged effects as well as the intercept.

Note that my model part ends with an empty line! This is because STAN always want the last line of the model code to be blank.

That’s it for the STAN-code, let’s quickly simulate some data in R and see if the code gives accurate estimates.

Testing the code

I tend to use cmdstan as my R-interface for STAN, but below I’ll use rstan.

# Load the necessary package
library(rstan)

# Specify the data provided to the model
N = 500

K <- 1

y <- arima.sim(list(order=c(1,0,0), ar=.5), n=N)

# If you want a mean different than 0, 5 for example, run: y <- y + 5  

# Put the data into a list so it can be provided to STAN

# For the basic AR(1) model
mod_data <- list(
              N = N,
              y = y
              )
 
# For the basic AR model that allows selecting the lag
mod_data_select <- list(
                     N = N,
                     y = y,
                     K = 1
                     )


# Compile the STAN models discussed above

# Basic AR(1) model
Basic_AR <- stan_model("AR1model.stan")

# Basic AR model that allows setting the lag
Basic_AR_Select_Lag <- stan_model("AR-K-model.stan")


# Estimate the two models
estimated_ar_model <- sampling(Basic_AR, 
                            data = mod_data, 
                            iter = 2000,
                            chains=2)
## 
## SAMPLING FOR MODEL 'AR1model' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.26 seconds (Warm-up)
## Chain 1:                0.263 seconds (Sampling)
## Chain 1:                0.523 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'AR1model' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.256 seconds (Warm-up)
## Chain 2:                0.263 seconds (Sampling)
## Chain 2:                0.519 seconds (Total)
## Chain 2:
estimated_ar_select_model <- sampling(Basic_AR_Select_Lag, 
                               data = mod_data_select, 
                               iter = 2000,
                               chains=2)
## 
## SAMPLING FOR MODEL 'AR-K-model' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.249 seconds (Warm-up)
## Chain 1:                0.263 seconds (Sampling)
## Chain 1:                0.512 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'AR-K-model' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.245 seconds (Warm-up)
## Chain 2:                0.221 seconds (Sampling)
## Chain 2:                0.466 seconds (Total)
## Chain 2:
# Show Results: True alpha = 0, true beta = .5, true sigma = 1

print(estimated_ar_model)
## Inference for Stan model: AR1model.
## 2 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=2000.
## 
##          mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
## alpha    0.02    0.00 0.05   -0.07   -0.01    0.02    0.05    0.11  2076    1
## beta     0.44    0.00 0.04    0.36    0.41    0.44    0.47    0.52  1832    1
## sigma    1.02    0.00 0.03    0.96    0.99    1.02    1.04    1.08  1925    1
## lp__  -257.86    0.04 1.30 -261.34 -258.41 -257.49 -256.92 -256.43   914    1
## 
## Samples were drawn using NUTS(diag_e) at Wed Nov 10 18:38:22 2021.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).
print(estimated_ar_select_model)
## Inference for Stan model: AR-K-model.
## 2 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=2000.
## 
##            mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
## alpha      0.02    0.00 0.04   -0.07   -0.01    0.02    0.05    0.11  2403    1
## beta[1]    0.44    0.00 0.04    0.36    0.42    0.44    0.47    0.52  2026    1
## sigma      1.02    0.00 0.03    0.96    0.99    1.01    1.04    1.08  1994    1
## lp__    -257.77    0.04 1.17 -260.74 -258.36 -257.47 -256.90 -256.42  1046    1
## 
## Samples were drawn using NUTS(diag_e) at Wed Nov 10 18:38:23 2021.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

The true values of alpha, beta, and sigma, are 0, .5, and 1 respectively, and as you can see both models accurately estimate the parameter values.

That’s All Folks

That’s it for this first installment of my series of posts on STAN-code for models often used with ILD. I hope you liked it! And if you have any suggestions and/or improvements, please let me know!

Next up, a multilevel AR(1) Model!

Joran Jongerling
Joran Jongerling
Assistant-Professor in Methods and Statistics

I’m an Assistant-Professor in methods and statistics who loves teaching, programming, and intensive longitudinal methods.

Related