C.1 Tumorgenicity Data Analysis of Chapter 3


C.1.1 Poisson Analysis with Weibull Baseline Rate

The data for the first five rats in the treated group are displayed
below in the data frame 'rats', in the 'counting process' format.

> rats <- read.table("rats.dat", header=F)
> dimnames(rats)[[2]] <- c("id","start","stop","status","enum","trt")
> rats$rtrunc  <- rep(NA, nrow(rats))
> rats$tstatus <- ifelse(rats$start == 0, 1, 2)
> rats[1:12, c("id","start","stop","status","rtrunc","tstatus","enum","trt")]

     id start stop status rtrunc tstatus enum trt 
   1  1     0  122      1     NA       1    1   1
   2  2     0  122      0     NA       1    1   1
   3  3     0    3      1     NA       1    1   1
   4  3     3   88      1     NA       2    2   1
   5  3    88  122      0     NA       2    3   1
   6  4     0   92      1     NA       1    1   1
   7  4    92  122      0     NA       2    2   1
   8  5     0   70      1     NA       1    1   1
   9  5    70   74      1     NA       2    2   1
  10  5    74   85      1     NA       2    3   1
  11  5    85   92      1     NA       2    4   1
  12  5    92  122      0     NA       2    5   1

Fit 'Weibull' model (3.56) to the control rats.  Note that the 'truncation' 
option enables one to indicate that the left truncation times are contained 
in the 'start' variable. 

> wfitC <- censorReg(censor(stop, status) ~ 1, 
                     data=rats, subset=(trt==0),
                     truncation=censor(start,rtrunc,tstatus),
                     distribution="weibull")
> summary(wfitC)

  Call:
  censorReg(formula = censor(stop, status) ~ 1, data = rats, 
            truncation = censor(start, rtrunc, tstatus), 
            subset = (trt == 0), distribution = "weibull")

   Distribution:  Weibull 

   Standardized Residuals:
                 Min    Max 
   Uncensored 0.1046 6.0405
     Censored 6.0400 6.0400

   Coefficients:
                Est. Std.Err. 95% LCL 95% UCL z-value   p-value 
   (Intercept) 3.161    0.153   2.861   3.461   20.66 7.551e-95

   Extreme value distribution: Dispersion (scale) = 0.9135831
   Observations: 173 Total; 22 Censored
   -2*Log-Likelihood: 1209 


The regression model rhoi(t) = rho0(t) * exp(xi * beta) is fit
by omitting the 'subset' option and specifying the 'trt' covariate
in the model.  

> wfit <- censorReg(censor(stop, status) ~ trt, data=rats,
                    truncation=censor(start,rtrunc,tstatus),
                    distribution="weibull")
> summary(wfit)

  Call:
  censorReg(formula = censor(stop, status) ~ trt, data = rats, 
            truncation = censor(start, rtrunc, tstatus), 
            distribution = "weibull")

  Distribution:  Weibull 

  Standardized Residuals:
                Min    Max 
  Uncensored 0.0519 6.0405
    Censored 2.6522 6.0400

  Coefficients:
                Est. Std.Err. 95% LCL 95% UCL z-value    p-value 
  (Intercept) 3.1097   0.1394  2.8365   3.383  22.314 2.673e-110
          trt 0.7754   0.1525  0.4764   1.074   5.084  3.704e-07

  Extreme value distribution: Dispersion (scale) = 0.94214648
  Observations: 254 Total; 42 Censored
  -2*Log-Likelihood: 1798 


Note that because of the way the model is parameterized in censorReg,
the parameters alpha1, alpha2, beta in Section 3.8.1 are related to 
those here by

> alpha1 <- as.vector( exp(-wfit$coef[1]) )
  [1] 0.0446

> alpha2 <- as.vector( 1/wfit$scale )
  [1] 1.0614

> beta <- as.vector( (-1)*wfit$coef[2]/wfit$scale )
  [1] -0.8230


C.1.2 Poisson Analysis with Piecewise Constant Rate

In Section 3.8.1, a piecewise constant rate function (Section 3.3) with
cutpoints at 30, 60, 90 days is considered, resulting in a four parameters
in the rate funciton.  Th entries in the data frame 'rats.pw' are given
below for the first five rats in the treated group.

gd.pw.f <- function(indata) {     
  pid <- sort(unique(indata$id))
  
  data <- matrix(0, nrow=(length(pid)*4), ncol=5)
  for (i in 1:length(pid)) {
    tmp <- indata[indata$id == pid[i], ]
    etime <- floor( tmp$stop[tmp$status == 1] )
      
    startpos <- 4*(i-1) + 1 
    stoppos  <- 4*i
    data[startpos:stoppos,1] <- rep(pid[i],4)
    data[startpos:stoppos,2] <- c(1,2,3,4)
    data[startpos:stoppos,3] <- c(sum((etime >  0) & (etime <= 30)), 
                                  sum((etime > 30) & (etime <= 60)),    
                                  sum((etime > 60) & (etime <= 90)),
                                  sum((etime > 90) & (etime <= 122)))
    data[startpos:stoppos,4] <- c(30,30,30,32)
    data[startpos:stoppos,5] <- rep(unique(tmp$trt),4)
  }
  data <- data.frame(data)
  dimnames(data)[[2]] <- c("id","interval","count","len","trt")
  return(data)
}

> rats.pw <- gd.pw.f(rats)
> rats.pw[1:20,]

     id interval count len trt 
   1  1        1     0  30   1
   2  1        2     0  30   1
   3  1        3     0  30   1
   4  1        4     1  32   1
   5  2        1     0  30   1
   6  2        2     0  30   1
   7  2        3     0  30   1
   8  2        4     0  32   1
   9  3        1     1  30   1
  10  3        2     0  30   1
  11  3        3     1  30   1
  12  3        4     0  32   1
  13  4        1     0  30   1
  14  4        2     0  30   1
  15  4        3     0  30   1
  16  4        4     1  32   1
  17  5        1     0  30   1
  18  5        2     0  30   1
  19  5        3     3  30   1
  20  5        4     1  32   1


Fit the piecewise constant model for control rats.

> options(contrasts = c("contr.treatment", "contr.poly"))

> pfitC <- glm(count ~ offset(log(len)) + factor(interval),
               family=poisson(link=log), data=rats.pw,
               subset=(trt==0), epsilon = 1e-004)
> summary(pfitC, corr=F)

  Call: glm(formula = count ~ offset(log(len)) + factor(interval), 
            family = poisson(link = log), data = rats.pw, 
            subset = (trt == 0), epsilon = 0.0001)

  Deviance Residuals:
      Min     1Q  Median     3Q   Max 
   -1.918 -1.575 -0.2736 0.6262 2.896

  Coefficients:
                      Value Std. Error  t value 
        (Intercept) -3.0937     0.1714 -18.0502
  factor(interval)2  0.1625     0.2330   0.6977
  factor(interval)3  0.3023     0.2260   1.3377
  factor(interval)4 -0.1569     0.2481  -0.6325

  (Dispersion Parameter for Poisson family taken to be 1 )

      Null Deviance: 167.8 on 99 degrees of freedom
  Residual Deviance: 163.3 on 96 degrees of freedom
  Number of Fisher Scoring Iterations: 4 


An estimate of the treatment effect based on a multiplicative model is obtained
by introducing the 'trt' covariate into the regression model.

> pfit <- glm(count ~ offset(log(len)) + factor(interval) + trt,
              family=poisson(link=log), data=rats.pw, epsilon = 1e-004)
> summary(pfit, corr=F)

  Call: glm(formula = count ~ offset(log(len)) + factor(interval) + trt, 
            family = poisson(link = log), data = rats.pw, epsilon = 0.0001)

  Deviance Residuals:
      Min     1Q  Median     3Q   Max 
   -1.834 -1.199 -0.3302 0.4701 3.055

  Coefficients:
                       Value Std. Error  t value 
        (Intercept) -3.08818     0.1506 -20.5007
  factor(interval)2  0.17185     0.1956   0.8785
  factor(interval)3  0.20634     0.1941   1.0631
  factor(interval)4 -0.06454     0.2039  -0.3166
                trt -0.82302     0.1514  -5.4354

  (Dispersion Parameter for Poisson family taken to be 1 )

      Null Deviance: 301.4 on 191 degrees of freedom
  Residual Deviance: 266.3 on 187 degrees of freedom
  Number of Fisher Scoring Iterations: 4 


C.1.3 Nonparametric and Semiparametric Poisson Analysis

The nonparametric Nelson-Aalen estimate of the mean function for control
individuals given in Figures 3.2 and 3.3, is obtained by fitting a null
Cox model using the 'rats' data frame.

> NPfit <- coxph(Surv(start,stop,status) ~ 1,
                 data=rats, subset=(trt==0), method="breslow")
> KM <- survfit(NPfit, type="aalen")
> NA.MF <- data.frame(time=c(0,KM$time),
                      na=-log(c(1,KM$surv)))
> plot(NA.MF$time, NA.MF$na, type="s", 
       xlab="TIME (DAYS)", ylab="EXPECTED NUMBER OF TUMORS", main="")


The semiparametric multiplicative Poisson model with a treatment covariate can
be fit with the 'coxph' function by including all subjects and adding the 
covariate 'trt'.

> fit <- coxph(Surv(start,stop,status) ~ trt, data=rats, method="breslow")
> summary(fit)

  Call: coxph(formula = Surv(start, stop, status) ~ trt, data = rats, 
              method = "breslow")

  n= 254 

        coef exp(coef) se(coef)     z       p 
  trt -0.816     0.442    0.152 -5.37 7.8e-08

      exp(coef) exp(-coef) lower .95 upper .95 
  trt     0.442       2.26     0.328     0.596

  Rsquare= 0.117   (max possible= 0.998 )
  Likelihood ratio test= 31.7  on 1 df,   p=1.81e-08
  Wald test            = 28.9  on 1 df,   p=7.76e-08
  Score (logrank) test = 30.5  on 1 df,   p=3.27e-08

> cox.zph(fit, transform="identity")

          rho  chisq     p 
  trt -0.0217 0.0995 0.752

> cox.zph(cfit, transform="log")

          rho chisq     p 
  trt -0.0505 0.541 0.462


C.1.4 Semiparametric Mixed Poisson Analysis

The coxph function can be used to fit semiparametric mixed Poisson models with the
frailty(id) option.

> fitf <- coxph(Surv(start,stop,status) ~ trt + frailty(id), data=rats, method="breslow")
> summary(fitf)

  n= 254 
                coef se(coef)   se2 Chisq   DF       p 
          trt -0.816 0.211    0.153 15.0   1.0 0.00011
  frailty(id)                       49.5  24.3 0.00190

      exp(coef) exp(-coef) lower .95 upper .95 
  trt     0.442       2.26     0.292     0.668

  Iterations: 6 outer, 18 Newton-Raphson
     Variance of random effect= 0.27   I-likelihood = -791.9 
  Degrees of freedom for terms=  0.5 24.3 
  Rsquare= 0.347   (max possible= 0.998 )
  Likelihood ratio test= 108  on 24.87 df,   p=2.38e-12
  Wald test            = 15  on 24.87 df,   p=0.939


C.1.5 Robust Semiparametric Analysis

Robust variance estimates and standard errors as given in (3.38) are obtained by 
using the cluster(id) option in coxph.

> fitr <- coxph(Surv(start,stop,status) ~ trt + cluster(id), data=rats, method="breslow")

  Call: coxph(formula = Surv(start, stop, status) ~ trt + cluster(id), 
              data = rats, method = "breslow")

  n= 254 

           coef exp(coef) se(coef) robust se        z          p 
  trt -0.815774  0.442297 0.151836   0.19809 -4.11819 3.8186e-05

      exp(coef) exp(-coef) lower .95 upper .95 
  trt  0.442297    2.26092  0.299985  0.652122

  Rsquare= 0.117   (max possible= 0.998 )
  Likelihood ratio test= 31.69 on 1 df, p=1.81146e-08
  Wald test            = 16.96 on 1 df, p=3.8186e-05
  Score (logrank) test = 30.54 on 1 df, p=3.26554e-08, Robust = 11.2 p=0.000816617

  (Note: the likelihood ratio and score tests assume independence of
   observations within a cluster, the Wald and robust score tests do not).