Resources for Vaske et al. (2019) "Comparative Tumor RNA Sequencing Analysis..."

"Comparative Tumor RNA Sequencing Analysis for Difficult-to-Treat Pediatric and Young Adult Patients With Cancer", published in JAMA Network Open October 25, 2019, presents a framework for comparative RNA-Seq analysis of pediatric tumors across multiple precision medicine studies. Our framework uses public genomic datasets of over 11,000 tumor RNA-Seq samples that we consolidated and released to the community. We describe an application of our framework and the data compendium to the analysis of 144 tumors from children and young adults with a relapsed, refractory, or rare cancer, studied on four separate precision medicine trials in the U.S. and Canada.

This page contains links to the resources we used in the publication. We are committed to reproducible science and open data sharing. Please contact us with any questions about our data or code. The resources on this page are those used for this publication; for subsequent releases, please see our Public Data page and our Treehouse CARE GitHub repository.

Download

Clinical Fields

Clinical fields including diagnosis, sex, age, histology, and others are available for the participant and background cohorts. Visit the Clinical Fields page for a short definition of each field.

Background Cohort*

Gene expression and clinical data for the reference compendium (n=11,340 samples) are available for download or visualization on UCSC Xena. Values in this dataset use Hugo gene names and are transformed by log2(x+1) of the TPM value. These data were generated by library preparation methods including polyA selection and ribosomal depletion. 36 fields of clinical data are also available. Samples are derived from clinical sites, publicly available repositories, TARGET, and TCGA.

*This publication-specific cohort was created in February 2018 and reflects the state of the Treehouse compendium at that time. Since then we have continued to accumulate data and improve our compendia; you can visit our Public Data hub for the latest versions.

Participants Cohort

Gene expression and clinical data for participants (n=144 samples) are available for download or visualization on UCSC Xena. Values in this dataset use Hugo gene names and are transformed by log2(x+1) of the TPM value. These data were generated by library preparation methods including polyA selection and ribosomal depletion. 37 fields of clinical data are also available. Samples are derived from clinical sites.

Code

Analysis of gene expression of a single sample relative to a background cohorts was performed with the following code.

Treehouse CARE

Treehouse Comparative Analysis of RNAseq Expression (CARE) is our tertiary processing protocol; it performs outlier analysis on RNA gene expression data and retrieves enriched pathways. The docker used to generate outlier results for this publication is available for download.

Treehouse Pipelines

Treehouse Pipelines is our secondary processing framework; it runs UCSC Computational Genomics Platform's Toil RNA-Seq Pipeline on RNA sequence data to produce gene expression levels.

Visualizations

TumorMap

Relationships between the samples present in the background cohort were visualized using UCSC TumorMap.

Xena

An overview of the participant cohorts was visualized using the Xena data browser.

Support

We are grateful to all our supporters and clinical partners and all of our repository data providers, and in particular we thank the patients and their families who have shared their data. Without all of these, we would not be able to accomplish this important work.

Thank you to all who are sharing data. A special shout out to the St. Baldrick's Foundation and the California Initiative to Advance Precision Medicine, not only for supporting Treehouse but for their commitment to data sharing and their efforts to advance responsible data sharing.