xcpoibin
examples30 November 2017
xcipoibin
calculates and stores in new variables the exact confidence intervals for means of Poisson- or proportions of Binomial-distributed random variables.
xcipoibin
is useful for the analysis of aggregate data. The observations usually refer to different levels of one or more categorical variables (e.g.: calendar year, country).
xcipoibin
can be used to calculate exact CIs for Incidence Rates (# events / total person-time), Standardized Incidence Ratios (# observed events / # expected events), or Cumulative Incidences (# events / total population) under Poisson or Binomial distributional assumptions.
xcipoibin
can be used to calculate exact CIs following commands that do not provide them. See for example help strate
, which calculates normal-based CIs for IRs/SIRs on the log scale.
Note: the term “exact confidence interval” refers to its being derived from the Poisson or the Binomial distribution, i.e. the distribution exactly generating the data, rather than resulting in exactly the nominal coverage. The actual coverage probability is guaranteed to be greater than or equal to the nominal confidence level help ci
).
Load data on prostate cancer cases by 5-year categories of attained age in 1998 [5].
. use https://raw.githubusercontent.com/anddis/xcipoibin/master/ex_ir.dta, clear
List the data.
. list, noobs sep(0) abbrev(15) ┌─────────────────────────────────────────────────────────────┐ │ calendar_year age_category obs_pca_cases person_years │ ├─────────────────────────────────────────────────────────────┤ │ 1998 45 1 6449 │ │ 1998 50 1 8631 │ │ 1998 55 8 7435 │ │ 1998 60 26 6025 │ │ 1998 65 36 6436 │ │ 1998 70 48 5694 │ │ 1998 75 49 4637 │ │ 1998 80 4 346 │ └─────────────────────────────────────────────────────────────┘
Calculate IRs per 100,000 person-years and exact 95% CIs, assuming that the number of events per category of attained age follows a Poisson distribution.
. xcipoibin obs_pca_cases person_years, per(100000) gen(rate lowerCI upperCI) poisson
List the results.
. format rate lowerCI upperCI %9.2f . list, noobs sep(0) abbrev(15) ┌───────────────────────────────────────────────────────────────────────────────────────────┐ │ calendar_year age_category obs_pca_cases person_years rate lowerCI upperCI │ ├───────────────────────────────────────────────────────────────────────────────────────────┤ │ 1998 45 1 6449 15.51 0.39 86.40 │ │ 1998 50 1 8631 11.59 0.29 64.55 │ │ 1998 55 8 7435 107.60 46.45 212.01 │ │ 1998 60 26 6025 431.54 281.89 632.30 │ │ 1998 65 36 6436 559.35 391.76 774.38 │ │ 1998 70 48 5694 842.99 621.56 1117.69 │ │ 1998 75 49 4637 1056.72 781.77 1397.04 │ │ 1998 80 4 346 1156.07 314.99 2960.00 │ └───────────────────────────────────────────────────────────────────────────────────────────┘
Load data on observed and expected prostate cancer cases by calendar year (1998-2012) [5].
. use https://raw.githubusercontent.com/anddis/xcipoibin/master/ex_sir.dta, clear
List the data.
. list, noobs sep(0) abbrev(15) ┌───────────────────────────────────────────────┐ │ calendar_year obs_pca_cases exp_pca_cases │ ├───────────────────────────────────────────────┤ │ 1998 173 168 │ │ 1999 223 197 │ │ 2000 226 212 │ │ 2001 256 220 │ │ 2002 258 232 │ │ 2003 363 269 │ │ 2004 329 293 │ │ 2005 356 288 │ │ 2006 275 269 │ │ 2007 309 256 │ │ 2008 303 246 │ │ 2009 343 284 │ │ 2010 281 257 │ │ 2011 275 247 │ │ 2012 243 221 │ └───────────────────────────────────────────────┘
Calculate SIRs and exact 95% CIs, assuming that the number of events per calendar year follows a Poisson distribution.
. xcipoibin obs_pca_cases exp_pca_cases, gen(sir lowerCI upperCI) poisson
Plot the results.
. tw (rcap upperCI lowerCI calendar_year, lc(black)) /// > (scatter sir calendar_year, m(Oh) mc(black)) , /// > legend(off) scheme(s1mono) xlabel(1998/2012, labsize(small)) /// > ylabel(1(0.2)1.6, angle(horiz) format(%3.2f)) ytitle(SIR) /// > yscale(log) xtitle(Calendar year) . graph export sir.png, replace (file sir.png written in PNG format)
strate
Replicate the example from help strate
.
. webuse diet, clear (Diet data with dates) . stset dox, origin(time doe) id(id) scale(365.25) fail(fail==1 3 13) id: id failure event: fail == 1 3 13 obs. time interval: (dox[_n-1], dox] exit on or before: failure t for analysis: (time-origin)/365.25 origin: time doe ────────────────────────────────────────────────────────────────────────────── 337 total observations 0 exclusions ────────────────────────────────────────────────────────────────────────────── 337 observations remaining, representing 337 subjects 46 failures in single-failure-per-subject data 4603.669 total analysis time at risk and under observation at risk from t = 0 earliest observed entry t = 0 last observed exit t = 20.04107 . stsplit ageband, at(40(10)70) after(time=dob) trim (26 + 0 obs. trimmed due to lower and upper bounds) (418 observations (episodes) created) . merge m:1 ageband using http://www.stata-press.com/data/r15/smrchd (note: variable ageband was byte, now float to accommodate using data's values) Result # of obs. ───────────────────────────────────────── not matched 26 from master 26 (_merge==1) from using 0 (_merge==2) matched 729 (_merge==3) ───────────────────────────────────────── . strate ageband, per(1000) smr(rate) output(smr, replace) failure _d: fail == 1 3 13 analysis time _t: (dox-origin)/365.25 origin: time doe id: id note: ageband<=40 trimmed Estimated SMRs and lower/upper bounds of 95% confidence intervals (729 records included in the analysis) ┌─────────────────────────────────────────────────┐ │ ageband D E SMR Lower Upper │ ├─────────────────────────────────────────────────┤ │ 40 6 5.62 1.0670 0.4793 2.3749 │ │ 50 18 18.75 0.9599 0.6048 1.5235 │ │ 60 22 22.85 0.9629 0.6340 1.4624 │ └─────────────────────────────────────────────────┘
Calculate exact 95% CIs.
. use smr, clear (Diet data with dates) . xcipoibin _D _E, poisson gen(_SMR2 _Lower_XCT _Upper_XCT)
List the results.
. format _SMR2 _Lower_XCT _Upper_XCT %8.4f . list, noobs sep(0) abbreviate(10) ┌────────────────────────────────────────────────────────────────────────────────────┐ │ ageband _D _E _SMR _Lower _Upper _SMR2 _Lower_XCT _Upper_XCT │ ├────────────────────────────────────────────────────────────────────────────────────┤ │ 40 6 5.62 1.0670 0.4793 2.3749 1.0670 0.3916 2.3223 │ │ 50 18 18.75 0.9599 0.6048 1.5235 0.9599 0.5689 1.5170 │ │ 60 22 22.85 0.9629 0.6340 1.4624 0.9629 0.6035 1.4579 │ └────────────────────────────────────────────────────────────────────────────────────┘
[1] Breslow N, Day NE. 1987. Statistical Methods in Cancer Research: Volume II, The Design and Analysis of Cohort Studies. Lyon: International Agency for Research on Cancer.
[2] StataCorp. 2015. Stata 14 Base Reference Manual. College Station, TX: Stata Press.
[3] Confidence intervals for the mean of a Poisson distribution. https://ms.mcmaster.ca/peter/s743/poissonalpha.html
[4] Brown LD, Cai TT, and DasGupta A. 2001. Interval estimation for a binomial proportion. Statistical Science 16: 101–133.
[5] Discacciati A. 2015. Risk factors for prostate cancer: analysis of primary data, pooling, and related methodological aspects. Karolinska Institutet. http://hdl.handle.net/10616/44872
markstat
.