Advertisement
Journal of Clinical Oncology  
Search for:
Limit by:
  Browse by Subject or Issue
Home Search or Browse JCO My JCO Subscriptions Customer Service Site Map

Journal of Clinical Oncology, Vol 26, No 9 (March 20), 2008: pp. 1400-1401
© 2008 American Society of Clinical Oncology.
DOI: 10.1200/JCO.2007.14.7306

This Article
Right arrow Full Text (PDF)
Right arrow Purchase Article
Right arrow View Shopping Cart
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a colleague
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Save to my personal folders
Right arrow Download to citation manager
Right arrowRights & Permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Owzar, K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Owzar, K.
Related Articles
Right arrowRelated Article

EDITORIAL

Alternate Statistical Tools and Limitations in Genetic Marker Association Studies in Single-Arm Drug Cancer Trials

Kouros Owzar

Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, NC

In this issue of the Journal, Graziano et al1 investigate the associations of a number of genotype and haplotype markers with clinical outcome in a sample of patients with metastatic colorectal cancer receiving salvage cetuximab plus irinotecan therapy. This Editorial will review of some of the statistical issues in the results and discussions. Genotype and haplotypes markers will be collectively referred to as genetic markers.

In genetic marker association studies, the first set of analyses commonly consists of the investigation of potential pairwise associations between each genetic marker and the clinical outcome of interest. These univariate analyses, as they are commonly referred to in the medical literature, are then followed by an investigation of the effect of adjusting for other potentially important molecular, clinical, or demographic variables. These multivariate analyses are commonly carried out using log-linear parametric logistic models2 for binary outcomes and log-linear semiparametric Cox proportional hazards models3 for time-to-event outcomes subject to censoring. The employment of these models is attractive, as they provide a unified framework for statistical inference on the variables and point and interval estimation of the effects (odds ratios for binary outcomes and hazard ratios for time-to-event outcomes). The use of these methods is also attractive from a practical point of view, as they have been widely implemented in commercial software distributions that allow the user, from the novice to the expert, to generate massive amount of output through graphical user interfaces.

As with any model, one needs to be concerned about robustness of the findings with respect to deviations from the model assumptions. Considerable deviations may not only invalidate the inferences but may also render the interpretation of the results difficult. For example, in the case of overdispersion—when the observed variance is higher than the variance of a theoretical model—,2 the inferences from the standard logistic model may not be valid. Or, if the proportional-hazards assumption were not to hold, the hazard ratio estimates may not be meaningful. One important qualifier that was not included in the description of these models was additivity. In an additive model, the relative change in the outcome variable due to changes in one of the explanatory variables is independent of the values of the remaining explanatory variables. Given that the number of variables is typically large, multivariate analyses are commonly limited to the framework of additive models, ignoring potential multiplicative effects (interactions) among the variables. If these interactions are strong, the interpretation of the results may be difficult, if not impossible.

Following a number of univariate analyses, Graziano et al1 carry out a multivariate analysis within the framework of an additive Cox model with genetic markers with clinical and demographic variables. The authors exercise due diligence in investigating the appropriateness of the assumed model by assessing the validity of the assumptions of proportional hazard and additivity. Some of these diagnostic tests may be of limited use if the test is "significant." One then has to be seriously concerned that something is awry; or, if not, then one may consider the results to be inconclusive. Alternatively, one could consider using a nonparametric approach, which at least implicitly accounts for potential interactions. Two such methods are conditional regression and inference trees4,5 and, more generally, ensemble methods such as random forests.6,7 While the first method renders predictions based on a single tree, the latter renders predictions based on aggregating the results from many thousands of trees. Both methods can accommodate binary as well as time-to-event outcomes. The conditional inference tree method allows for inference on the variables, while the random forests method provides estimates of the importance of the variables. Both methods could be used for the purpose of classification. Another attractive feature of these methods are also better suited to handle missing data than the classical models. The variables selected by the authors were chosen based on their nominal significance at a fixed threshold from the univariate analyses. It should be noted that P values presented in the multivariate analysis have not been adjusted to take into account the testing carried out during the univariate variable selection process. This may not be a critical issue given that the results, as clearly stated by the authors, are hypothesis generating. However, it should be pointed out that supervised learning methods, such as conditional inference trees and random forests, have been developed specifically for model building when a large number of potential variables are present. Unless the number of potential variables is exorbitantly large, as in the case of a genome-wide study, the initial variable selection step may not be needed. The point of this discussion is not to in any way discount the importance of the logistic and Cox models that remain integral tools in the arsenal of an applied statistician. Rather, it is intended to indicate that alternate approaches are available.

A statistical method in clinical genomics research is not practical unless it is implemented in an accessible computing environment. The open-source software community has made great strides in providing software tools for the management, statistical analysis, visualization, and annotation of genomic data from clinical studies. For example, R8 is a comprehensive open-source environment for statistical analysis, computing, graphics, and data management. In addition to offering facilities for classical univariate and multivariate analyses, R8 provides for conditional classification and inference trees and random forests methods for both binary and time-to-event through several add-on packages.5,7,9-11 For the purpose of clinical genomics research, these facilities are complemented by those offered by the Bioconductor project.12 Although the major focus of this project is in the development of software infrastructure, preprocessing, and analysis tools for high-dimensional molecular data, it also provides facilities specifically tailored for analyzing candidate marker studies. The R/Bioconductor platform for the most part does not adhere to the point-and-click graphical user interface paradigm. However, that should not be considered of a user-unfriendly approach, given that the projects offer a comprehensive array of educational material and support. Both projects administer several online forums to enable direct communication among the developers and users. The forum discussions are not limited to queries about syntax, bug reports, or feature requests related to the software, but often lead to intellectually lively and stimulating debates on technical and philosophical issues related to statistics and related disciplines. Many packages offer vignettes or previews of the some of the representative features serving as tutorials for new users. Several monographs illustrating the usage of the tools within the framework of case studies are available.13,14 Additional educational opportunities are provided by making the slides from lectures and seminars from annual workshops sponsored by the projects available for download. For a preliminary evaluation of the facilities offered for classification trees and random forests, the interested reader may want to explore the party5 package.

Of far greater importance than gaining an understanding of the technical details related to the various statistical methodologies and tools employed for the analyses is the proper interpretation of the statistical results. As Graziano et al1 have aptly pointed out, the scope of their work is limited to the investigation of possible associations between genetic markers and clinical outcomes in patients with metastatic colorectal cancer receiving salvage cetuximab plus irinotecan therapy. This key observation highlights a subtle but rather profound limitation of genetic marker association studies carried out within the framework of single-arm drug studies in cancer. If a bonafide pharmacogenetic question were to be defined as whether the dependency of the relative benefit (or harm) attributed to the target drug is dependent on a genetic marker, then the results from these types of studies may be of limited usefulness. The question may be statistically formulated as the investigation of a drug therapy by genetic marker interaction with respect to the clinical outcome of interest. To address the interaction question, a true control arm is needed. For some studies, depending on the patient population, devising a control arm is not practical or even possible from an ethical standpoint. Even in the case of randomized phase II studies with a control arms, it may be difficult to detect an interaction due to generally relative small sample sizes unless the interaction is large.

AUTHORS’ DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST

The author(s) indicated no potential conflicts of interest.

REFERENCES

1. Graziano F, Ruzzo A, Loupakis F, et al: Pharmacogenetic profiling for cetuximab/irinotecan therapy in patients with refractory advanced colorectal cancer. J Clin Oncol 26:1427-1434, 2008[Abstract/Free Full Text]

2. McCullagh P, Nelder JA: Generalized Linear Models (ed 2). London, United Kingdom, Chapman & Hall Ltd, 1989

3. Cox DR: Regression models and life-tables (with discussion). J Roy Statist Soc Ser B 34:187-220, 1972

4. Breiman L, Friedman J, Olshen R, Stone C: Classification and Regression Trees. Boca Raton, FL, Chapman & Hall/CRC, 1984

5. Hothorn T, Hornik K, Zeileis A: Unbiased recursive partitioning: A conditional inference framework. J Comp Graph Statist 15:651-674, 2006[CrossRef]

6. Breiman L: Random forests. Machine Learning 45:5-32, 2001[CrossRef]

7. Hothorn T, Lausen B, Benner A, Radespiel-Troeger M: Bagging survival trees. Stat Med 23:77-91, 2004[CrossRef][Medline]

8. R Development Core Team: R: A Language and Environment for Statistical Computing. Vienna, Austria, R Foundation for Statistical Computing, 2007

9. Therneau TM, Atkinson B: R port by Brian Ripley <ripley@statsoxacuk> BAR: rpart: Recursive Partitioning, 2007. R package version 3.1-38. S-PLUS 6.x original at http://mayoresearch.mayo.edu/mayo/research/biostat/splusfunctions.cfm

10. Ishwaran H, Kogalur UB: randomSurvivalForest: Ishwaran and Kogalur's Random Survival Forest, 2007. R package version 3.0.1

11. Liaw A, Wiener M: Classification and regression by random forest. R News 2:18-22, 2002

12. Gentleman RC, Carey VJ, Bates DM, et al: Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology 5:R80, 2004[CrossRef][Medline]

13. Venables WN, Ripley BD: Modern Applied Statistics with S (ed 4). New York, NY, Springer, 2002

14. Gentleman R, Carey V, Huber W, et al: Bioinformatics and Computational Biology Solutions using R and Bioconductor. New York, NY, Springer, 2005


Related Article

  • Pharmacogenetic Profiling for Cetuximab Plus Irinotecan Therapy in Patients With Refractory Advanced Colorectal Cancer
    Francesco Graziano, Annamaria Ruzzo, Fotios Loupakis, Emanuele Canestrari, Daniele Santini, Vincenzo Catalano, Renato Bisonni, Umberto Torresi, Irene Floriani, Gaia Schiavon, Francesca Andreoni, Paolo Maltese, Eliana Rulli, Bostjan Humar, Alfredo Falcone, Lucio Giustini, Giuseppe Tonini, Andrea Fontana, Gianluca Masi, and Mauro Magnani
    JCO 2008 26: 1427-1434 [Abstract] [Full Text]



This Article
Right arrow Full Text (PDF)
Right arrow Purchase Article
Right arrow View Shopping Cart
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a colleague
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Save to my personal folders
Right arrow Download to citation manager
Right arrowRights & Permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Owzar, K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Owzar, K.
Related Articles
Right arrowRelated Article

About
JCO
 Editorial
Roster
 Advertising
Information
 Librarians &
Institutions
 Rights &
Permissions
 PDA Services

Copyright © 2008 by the American Society of Clinical Oncology, Online ISSN: 1527-7755. Print ISSN: 0732-183X
Terms and Conditions of Use
  HighWire Press HighWire Press™ assists in the publication of JCO Online