As a learning exercise I concocted the example below involving Bayesian logistic regression. It was inspired by an article on Matt Shotwell's blog about using R environments rather than lists to store the state of a Markov Chain Monte Carlo sampler. Here I use proto to create a parent class-like object (or

*trait*in proto-ese) to contain the regression functions and create child objects to hold both data and results for individual analyses.

First here's an example session...

# Make up some data with a continuous predictor and binary response

nrec <- 500

x <- rnorm(nrec)

y <- rbinom(nrec, 1, plogis(2 - 4*x))

# Predictor matrix with a col of 1s for intercept

pred <- matrix(c(rep(1, nrec), x), ncol=2)

colnames(pred) <- c("intercept", "X")

# Load the proto package

library(proto)

# Use the Logistic parent object to create a child object which will

# hold the data and run the regression (the $ operator references

# functions and data within a proto object)

lr <- Logistic$new(pred, y)

lr$run(5000, 1000)

# lr now contains both data and results

str(lr)

proto object

$ cov : num [1:2, 1:2] 0.05 -0.0667 -0.0667 0.1621

..- attr(*, "dimnames")=List of 2

$ prior.cov: num [1:2, 1:2] 100 0 0 100

$ prior.mu : num [1:2] 0 0

$ beta : num [1:5000, 1:2] 2.09 2.09 2.09 2.21 2.21 ...

..- attr(*, "dimnames")=List of 2

$ adapt : num 1000

$ y : num [1:500] 0 1 1 1 1 1 1 1 1 1 ...

$ x : num [1:500, 1:2] 1 1 1 1 1 1 1 1 1 1 ...

..- attr(*, "dimnames")=List of 2

parent: proto object

# Use the Logistic summary function to tabulate and plot results

lr$summary()

From 5000 samples after 1000 iterations burning in

intercept X

Min. :1.420 Min. :-5.296

1st Qu.:1.840 1st Qu.:-3.915

Median :2.000 Median :-3.668

Mean :1.994 Mean :-3.693

3rd Qu.:2.128 3rd Qu.:-3.455

Max. :2.744 Max. :-2.437

And here's the code for the Logistic trait...

Logistic <- proto()

Logistic$new <- function(., x, y) {

# Creates a child object to hold data and results

#

# x - a design matrix (ie. predictors

# y - a binary reponse vector

proto(., x=x, y=y)

}

Logistic$run <- function(., niter, adapt=1000) {

# Perform the regression by running the MCMC

# sampler

#

# niter - number of iterations to sample

# adapt - number of prior iterations to run

# for the 'burning in' period

require(mvtnorm)

# Set up variables used by the sampler

.$adapt <- adapt

total.iter <- niter + adapt

.$beta <- matrix(0, nrow=total.iter, ncol=ncol(.$x))

.$prior.mu <- rep(0, ncol(.$x))

.$prior.cov <- diag(100, ncol(.$x))

.$cov <- diag(ncol(.$x))

# Run the sampler

b <- rep(0, ncol(.$x))

for (i in 1:total.iter) {

b <- .$update(i, b)

.$beta[i,] <- b

}

# Trim the results matrix to remove the burn-in

# period

if (.$adapt > 0) {

.$beta <- .$beta[(.$adapt + 1):total.iter,]

}

}

Logistic$update <- function(., it, beta.old) {

# Perform a single iteration of the MCMC sampler using

# Metropolis-Hastings algorithm.

# Adapted from code by Brian Neelon published at:

# http://www.duke.edu/~neelo003/r/

#

# it - iteration number

# beta.old - vector of coefficient values from

# the previous iteration

# Update the coefficient covariance if we are far

# enough through the sampling

if (.$adapt > 0 & it > 2 * .$adapt) {

.$cov <- cov(.$beta[(it - .$adapt):(it - 1),])

}

# generate proposed new coefficient values

beta.new <- c(beta.old + rmvnorm(1, sigma=.$cov))

# calculate prior and current probabilities and log-likelihood

if (it == 1) {

.$..log.prior.old <- dmvnorm(beta.old, .$prior.mu, .$prior.cov, log=TRUE)

.$..probs.old <- plogis(.$x %*% beta.old)

.$..LL.old <- sum(log(ifelse(.$y, .$..probs.old, 1 - .$..probs.old)))

}

log.prior.new <- dmvnorm(beta.new, .$prior.mu, .$prior.cov, log=TRUE)

probs.new <- plogis(.$x %*% beta.new)

LL.new <- sum(log(ifelse(.$y, probs.new, 1-probs.new)))

# Metropolis-Hastings acceptance ratio (log scale)

ratio <- LL.new + log.prior.new - .$..LL.old - .$..log.prior.old

if (log(runif(1)) < ratio) {

.$..log.prior.old <- log.prior.new

.$..probs.old <- probs.new

.$..LL.old <- LL.new

return(beta.new)

} else {

return(beta.old)

}

}

Logistic$summary <- function(., show.plot=TRUE) {

# Summarize the results

cat("From", nrow(.$beta), "samples after", .$adapt, "iterations burning in\n")

base::print(base::summary(.$beta))

if (show.plot) {

par(mfrow=c(1, ncol(.$beta)))

for (i in 1:ncol(.$beta)) {

plot(density(.$beta[,i]), main=colnames(.$beta)[i])

}

}

}

Now that's probably not the greatest design in the world, but it only took me a few minutes to put it together and it's incredibly easy to modify or extend. Try it !

Thanks to Brian Neelon for making his MCMC logistic regression code available (http://www.duke.edu/~neelo003/r/).

## No comments:

## Post a Comment