Copy Number variation
On Copy Number Variation module, the statistics of hetero-zygous and homozygous CNV of each cancer type are dis-played as pie chat for gene set, and Pearson correlation is performed between gene expression and CNV of each gene in each cancer to help to analyze the gene expression signifi-cantly affected by CNV.
- Overall description:
- In this CNV module, we calculate the percentage of CNV, CNV correlation with mRNA of gene in each cancer type. The CNV was devided into 2 subtypes, heterozygous CNV and homozygous CNV, which represent the occurrence of CNV on only one chromosome or both two. Percentage statistic based on subtypes of CNV used GISTIC processed CNV data, and calculation of correlation used raw CNV data and mRNA RPKM data.
- We collected 11495 CNV data from NCI Genomic Data Commons, and process them with GISTICS2.0 (C. H. Mermel et al, 2011) The CNV statistic is based on GISTIC processed data, and the correlation between CNV and mRNA expression is based on CNV raw data.
- CNV Pie distribution:
- CNV pie plot gives you a global profile that shows the constitute of Heterozygous/Homozygous CNV of each gene in each cancer. A pie represents the proportion of different types of CNV of one gene in one cancer, and different color represent different types of CNV.
- Hete CNV profile:
- Heterozygous CNV profile show you percentage of heterozygous cnv, including amplification and deletion percentage of heterozygous CNV about each gene in each cancer. Only genes with > 5% CNV in cancers will show corresponding point on the figure. The heterozygous CNV is generally happened so the plot will full of points, but most of researchs such as cBioPortal mainly focus on homozygous CNV in the present.
- Homo CNV profile:
- Homozygous CNV profile show you percentage of homozygous CNV, including amplification and deletion percentage of homozygous CNV about each gene in each cancer. Only genes with > 5% CNV in cancers will show corresponding point on the figure. Most of researchs such as cBioPortal mainly focus on homozygous CNV in the present.
- CNV correlate to gene expression:
- The mRNA expression and CNV data were merged by TCGA barcode. We test the association between paired mRNA expression and CNV percent samples, based on Person's product moment correlation coefficient, and follows a t distribution. P-value was adjusted by FDR. This method has been employed in A. Schlattl et al,2011 to relate copy-number to transcriptome sequencing data.
- CNV Pie distribution:
- Hete Amp: heterozygous amplification; Hete Del: heterozygous deletion; Homo Amp: homozygous amplification; Homo Del: homozygous deletion; None: no CNV.
- Hete/Homo CNV profile:
- Heterozygous/Homozygous CNV profile show you percentage of heterozygous/homozygous cnv, including amplification and deletion percentage of heterozygous/homozygous CNV about each gene in each cancer. Only genes with > 5% CNV in cancers will show corresponding points on the figure.
- CNV correlate to gene expression:
- Genes whose mRNA expression significantly (FDR<=0.05) correlate with CNV percentage were shown on the figure. From this, we can get genes whose expression significantly regulated by CNV. Blue bubbles represent a negative correlation (means when gene having a high frequency of CNV, gene's expression downregulate, they have opposite trend), and red bubbles represent positive correlation (means when gene having a high frequency of CNV, the gene expression upregulate too, they have consistent trend), the deeper of color, the higher the correlation. And size of the point represents statistic significance, the bigger of size, the more statistic significant.
Single Nucleotide Mutation
Single Nucleotide Variation(SNV) module presents the SNV frequency and variant types of the gene set in selected cancer types. The effects of mutations to overall survivalOS are given by means of the log-rank test which facilitate to evaluate the relationship between gene set mutations and clinical outcomes.
- We collected 8663 SNV data from NCI Genomic Data Commons, including 33 cancer types.
- SNV percentage
- SNV percentage was calculate by: Num Of Mutated Sample/Num of Cancer Sample.
- SNV summary and oncoplot waterfall plot was generated by maftools (Mayakonda and Koeffler, 2016)
- SNV survival
- SNV data and clinical overall survival data was combined, and R package survival was used to estimate survival difference between mutate and non-mutate gene. Cox regression (Andersen, P. and Gill, R, 1982.) was performed to estimate the hazards of mutated group, and a log rank test (Harrington, D. P. and Fleming, T. R.) was also performed to compare the distributions of two groups, p value <=0.05 was considered as significant.
- SNV percentage
- Give you SNV frequency of genes in each cancers. The deeper of color, the higher of mutate frequency. Numbers in each cells represent number of samples have corresponding mutated gene in corresponding cancers.
- SNV summary
- A summary plot displays number of variants in each sample as a stacked barplot and variant types as a boxplot summarized.
- SNV oncoplot
- An oncoplot also known as waterfall plots, gives a mutation distribution of top 10 (when num of gene set <10, show all) mutated genes in your gene set and a SNV classification of SNV types (include missense mutation, frame shift deletion, nonsense mutation etc.). All selected cancers' sample will be shown together. Side barplot and top barplots show number of variants in each sample or each gene.
- SNV survival
- A survival plot will give you a survival difference between mutated and non-mutate gene, only p value significantly(<=0.05) genes will be displayed here. Blue points represent patients with mutated gene have worse overall survival, red points represent patients with mutated gene have better overall survival.
Methylation module explores the differential methylation between tumor and paired normal, the correlation between methylation with expression and the OS affected by methyla-tion level for selected cancer types.
- We collected 10129 methy data from NCI Genomic Data Commons, including 33 cancer types, but only 14 cancer types have paired tumor vs. normal data, so differential methylation analysis was based on these 14 cancer types.
- Cancers with more than 10 tumor-normal pairs will be have a calculation between tumor and normal, but not only paired samples were included. And a student T test were performed to define the methylation difference between tumor and normal samples, p value was adjusted by FDR, FDR <= 0.05 was considered as significant.
- Methylation data and clinical overall survival data was combined, and methylation level of gene was divided into 2 groups by middle methylation. Cox regression was performed to estimate the hazards(risk of death), if Cox coef > 0, the high methylation group shows a worse survival, the Hyper_worse defined as High, otherwise defined as Low. And a log rank test was also performed to compare the distributions of two groups, p value <0.05 was considered as significant.
correlate to mRNA RPKM
- Methylation can influence the expression of gene in theory. The mRNA expression and methylation data were merged by TCGA barcode. We test the association between paired mRNA expression and methylation, based on Person's product moment correlation coefficient, and follows a t distribution. P-value was adjusted by FDR and genes with FDR<=0.05 will be remained. From this, we may get genes whose expression is significantly influenced by genome methylation.
- Differential Methylation bubble plot show you genes' methylation change between tumor and normal samples in each cancers. Blue points represent a methylation upregulation in tumors, red points represent a methylation downregulation in tumors, the deeper of color, the higher the difference. And size of the point represents statistic significance, the bigger of size, the more significantly.
- Gives you a survival difference between samples with a specific genes' high and low-methylation, only logrank p value significant(<=0.05) genes will be displayed on the figure. Red point represents low worse of high methylation group, blue point is just the opposite. Size of the point represents statistic significance, the bigger of size, the more significantly.
- Give you a person correlation between methylation and mRNA gene expression. Blue points represent negative correlation (means when the level of gene's methylation upregulate, the gene expression downregulate in stead of upregulate, they have opposite trend), and red represent positive correlation (means when the level of gene's methylation upregulate, the gene expression upregulate too, they have consistent trend), the deeper of color, the higher the correlation. And size of the point represents statistic significance, the bigger of size, the more significantly.
Pathway Activity module presents the difference of genes expression between pathway activity groups (activation and inhibition) that defined by pathway scores.
- RPPA data from TCPA are used to calculate score for 7876 samples, 10 cancer related pathways and 32 cancer types. Reverse phase protein array (RPPA) is a high-throughput antibody-based technique with the procedures similar to that of Western blots. Proteins are extracted from tumor tissue or cultured cells, denatured by SDS, printed on nitrocellulose-coated slides followed by antibody probe ( TCPA database ), TCPA RPPA data are all from TCGA samples.
- The pathway we included in are: TSC/mTOR, RTK, RAS/MAPK, PI3K/AKT, Hormone ER, Hormone AR, EMT, DNA Damage Response, Cell Cycle, Apoptosis pathways. They are all famous cancer related pathway.
- Pathway score
- RBN RPPA data were median-centered and normalized by standard deviation across all samples for each component to obtain the relative protein level. The pathway score is then the sum of the relative protein level of all positive regulatory components minus that of negative regulatory components in a particular pathway (R. Akbani et al.).
- Gene expression was divided into 2 groups(groupHigh and groupLow) by median expression, the difference of pathway activity score(PAS) between groups is defined by student T test, p value was adjusted by FDR, FDR<=0.05 is considered as significant. When PAS(Gene A groupHigh) > PAS(Gene A groupLow), we consider gene A may have a activate effect to a pathway, otherwise have a inhibit effect to a pathway. A similar method has been applied in Y. Ye et al.
- Global percentage
- Global percentage of genes in all cancers(32), shows percentage(number of activate or inhibit cancer types/32) of gene's function (activation or inhibition) for each pathway in all cancers.
- Hetmap percentage
- Heatmap show you genes that have function (inhibit or activate) in at least 5 cancer types. Pathway_a represent activation of this pathway, inhibition in a similar way showed as pathway_i.
- Relation network
- This network show you the relationship between genes and pathways by a line connection. Solid line means activation, dashed lines means inhibition. Color of line represent different cancer types.
miRNA Regulation module will give you a miRNA regulation network, for you to visualize the potential regulation of miRNAs to your genes.
miRNA regulation data was collected form databases: include experimental verified (papers, TarBase, miRTarBase, mir2disease ), and targetscan, miRanda predicted. And only miRNA-gene pairs who have been recorded will be used to calculate a expression correlation here.
miRNA transcript expression data was collected from TCGA, including 9105 samples and 33 cancer types.
- Regulation confirm
1. miRNA expression and gene expression were merged by TCGA barcode. We test the association between paired mRNA and miRNA expression, based on Person's product moment correlation coefficient, and follows a t distribution. P-value was adjusted by FDR and genes with FDR<=0.05, R<0 will be remained. Correlation was calculated in all paired samples(33 cancers), so cancer type selection change will not change the result with no change in gene set. 2. In consideration of the presence of positive regulators like transcription factors, a miRNA-gene pair with negative correlation will be considered as a potential negatively regulation pair.
3. Only the miRNA-gene pairs who have been recorded in databases we refered below will be calculated at step 1.
- The networkD3 R package is used to generate this beautiful network, if you need to get a cool miRNA regulation network for your paper, it will be a nice choice. In this network, node size is positively correlate to the node's degree. Subgroups are generated by getting communities in a graph via random walks by igraph R package, and shown by different colors.
- The visNetwork R package is used to generate this useful network, you can click, drag, delele and add nodes or edges on the web as you wish. A node represents a miRNA or gene, an edge represents a regulation of miRNA to gene. We have cluster them by color, and Node size is positively correlate to the node's degree similar to networkD3, and edge width is defined by absolute value of correlation coefficient.
An-Yuan Guo, Ph.D. Professor of Bioinformatics
Chun-Jie Liu, Ph.D. Candidate
Fei-Fei Hu, Ph.D. Candidate
Qiong Zhang, Postdoc Fellow