xcpoibin examples

Andrea Discacciati, Karolinska Institutet

30 November 2017

xcipoibin calculates and stores in new variables the exact confidence intervals for means of Poisson- or proportions of Binomial-distributed random variables.

xcipoibin is useful for the analysis of aggregate data. The observations usually refer to different levels of one or more categorical variables (e.g.: calendar year, country).

xcipoibin can be used to calculate exact CIs for Incidence Rates (# events / total person-time), Standardized Incidence Ratios (# observed events / # expected events), or Cumulative Incidences (# events / total population) under Poisson or Binomial distributional assumptions.

xcipoibin can be used to calculate exact CIs following commands that do not provide them. See for example help strate, which calculates normal-based CIs for IRs/SIRs on the log scale.

Note: the term “exact confidence interval” refers to its being derived from the Poisson or the Binomial distribution, i.e. the distribution exactly generating the data, rather than resulting in exactly the nominal coverage. The actual coverage probability is guaranteed to be greater than or equal to the nominal confidence level help ci).

Incidence Rates (IRs)

Load data on prostate cancer cases by 5-year categories of attained age in 1998 [5].

. use https://raw.githubusercontent.com/anddis/xcipoibin/master/ex_ir.dta, clear

List the data.

. list, noobs sep(0) abbrev(15)

  ┌─────────────────────────────────────────────────────────────┐
  │ calendar_year   age_category   obs_pca_cases   person_years │
  ├─────────────────────────────────────────────────────────────┤
  │          1998             45               1           6449 │
  │          1998             50               1           8631 │
  │          1998             55               8           7435 │
  │          1998             60              26           6025 │
  │          1998             65              36           6436 │
  │          1998             70              48           5694 │
  │          1998             75              49           4637 │
  │          1998             80               4            346 │
  └─────────────────────────────────────────────────────────────┘

Calculate IRs per 100,000 person-years and exact 95% CIs, assuming that the number of events per category of attained age follows a Poisson distribution.

. xcipoibin obs_pca_cases person_years, per(100000) gen(rate lowerCI upperCI) poisson

List the results.

. format rate lowerCI upperCI %9.2f

. list, noobs sep(0) abbrev(15)

  ┌───────────────────────────────────────────────────────────────────────────────────────────┐
  │ calendar_year   age_category   obs_pca_cases   person_years      rate   lowerCI   upperCI │
  ├───────────────────────────────────────────────────────────────────────────────────────────┤
  │          1998             45               1           6449     15.51      0.39     86.40 │
  │          1998             50               1           8631     11.59      0.29     64.55 │
  │          1998             55               8           7435    107.60     46.45    212.01 │
  │          1998             60              26           6025    431.54    281.89    632.30 │
  │          1998             65              36           6436    559.35    391.76    774.38 │
  │          1998             70              48           5694    842.99    621.56   1117.69 │
  │          1998             75              49           4637   1056.72    781.77   1397.04 │
  │          1998             80               4            346   1156.07    314.99   2960.00 │
  └───────────────────────────────────────────────────────────────────────────────────────────┘

Standardized Incidence Ratios (SIRs)

Load data on observed and expected prostate cancer cases by calendar year (1998-2012) [5].

. use https://raw.githubusercontent.com/anddis/xcipoibin/master/ex_sir.dta, clear

List the data.

. list, noobs sep(0) abbrev(15)

  ┌───────────────────────────────────────────────┐
  │ calendar_year   obs_pca_cases   exp_pca_cases │
  ├───────────────────────────────────────────────┤
  │          1998             173             168 │
  │          1999             223             197 │
  │          2000             226             212 │
  │          2001             256             220 │
  │          2002             258             232 │
  │          2003             363             269 │
  │          2004             329             293 │
  │          2005             356             288 │
  │          2006             275             269 │
  │          2007             309             256 │
  │          2008             303             246 │
  │          2009             343             284 │
  │          2010             281             257 │
  │          2011             275             247 │
  │          2012             243             221 │
  └───────────────────────────────────────────────┘

Calculate SIRs and exact 95% CIs, assuming that the number of events per calendar year follows a Poisson distribution.

. xcipoibin obs_pca_cases exp_pca_cases, gen(sir lowerCI upperCI) poisson

Plot the results.

. tw (rcap upperCI lowerCI calendar_year, lc(black)) ///
> (scatter sir calendar_year, m(Oh) mc(black)) , ///
> legend(off) scheme(s1mono) xlabel(1998/2012, labsize(small)) ///
> ylabel(1(0.2)1.6, angle(horiz) format(%3.2f)) ytitle(SIR) ///
> yscale(log) xtitle(Calendar year)

. graph export sir.png, replace
(file sir.png written in PNG format)

Standardized Mortality Ratios (SMRs) following strate

Replicate the example from help strate.

. webuse diet, clear
(Diet data with dates)

. stset dox, origin(time doe) id(id) scale(365.25) fail(fail==1 3 13)

                id:  id
     failure event:  fail == 1 3 13
obs. time interval:  (dox[_n-1], dox]
 exit on or before:  failure
    t for analysis:  (time-origin)/365.25
            origin:  time doe

──────────────────────────────────────────────────────────────────────────────
        337  total observations
          0  exclusions
──────────────────────────────────────────────────────────────────────────────
        337  observations remaining, representing
        337  subjects
         46  failures in single-failure-per-subject data
   4603.669  total analysis time at risk and under observation
                                                at risk from t =         0
                                     earliest observed entry t =         0
                                          last observed exit t =  20.04107

. stsplit ageband, at(40(10)70) after(time=dob) trim
(26 + 0 obs. trimmed due to lower and upper bounds)
(418 observations (episodes) created)

. merge m:1 ageband using http://www.stata-press.com/data/r15/smrchd
(note: variable ageband was byte, now float to accommodate using data's values)

    Result                           # of obs.
    ─────────────────────────────────────────
    not matched                            26
        from master                        26  (_merge==1)
        from using                          0  (_merge==2)

    matched                               729  (_merge==3)
    ─────────────────────────────────────────

. strate ageband, per(1000) smr(rate) output(smr, replace)

         failure _d:  fail == 1 3 13
   analysis time _t:  (dox-origin)/365.25
             origin:  time doe
                 id:  id
               note:  ageband<=40 trimmed

Estimated SMRs and lower/upper bounds of 95% confidence intervals
(729 records included in the analysis)

  ┌─────────────────────────────────────────────────┐
  │ ageband    D       E      SMR    Lower    Upper │
  ├─────────────────────────────────────────────────┤
  │      40    6    5.62   1.0670   0.4793   2.3749 │
  │      50   18   18.75   0.9599   0.6048   1.5235 │
  │      60   22   22.85   0.9629   0.6340   1.4624 │
  └─────────────────────────────────────────────────┘

Calculate exact 95% CIs.

. use smr, clear
(Diet data with dates)

. xcipoibin _D _E, poisson gen(_SMR2 _Lower_XCT _Upper_XCT)

List the results.

. format _SMR2 _Lower_XCT _Upper_XCT %8.4f

. list, noobs sep(0) abbreviate(10)

  ┌────────────────────────────────────────────────────────────────────────────────────┐
  │ ageband   _D      _E     _SMR   _Lower   _Upper    _SMR2   _Lower_XCT   _Upper_XCT │
  ├────────────────────────────────────────────────────────────────────────────────────┤
  │      40    6    5.62   1.0670   0.4793   2.3749   1.0670       0.3916       2.3223 │
  │      50   18   18.75   0.9599   0.6048   1.5235   0.9599       0.5689       1.5170 │
  │      60   22   22.85   0.9629   0.6340   1.4624   0.9629       0.6035       1.4579 │
  └────────────────────────────────────────────────────────────────────────────────────┘

References

[1] Breslow N, Day NE. 1987. Statistical Methods in Cancer Research: Volume II, The Design and Analysis of Cohort Studies. Lyon: International Agency for Research on Cancer.

[2] StataCorp. 2015. Stata 14 Base Reference Manual. College Station, TX: Stata Press.

[3] Confidence intervals for the mean of a Poisson distribution. https://ms.mcmaster.ca/peter/s743/poissonalpha.html

[4] Brown LD, Cai TT, and DasGupta A. 2001. Interval estimation for a binomial proportion. Statistical Science 16: 101–133.

[5] Discacciati A. 2015. Risk factors for prostate cancer: analysis of primary data, pooling, and related methodological aspects. Karolinska Institutet. http://hdl.handle.net/10616/44872

       

Generated with markstat.