A portal to investigate the many transcriptional footprints of metabolic processes, identified in complex patient tissue biopsies and cell lines
Start browsing the metabolic landscape right away with the buttons above, or read the introduction below first.
Don't know where to start? To be able to formulate new research questions by investigating this huge dataset, or to simply be able to use the site as a reference, let us first introduce you to the publication and the different ways of investigating metabolic Transcriptional Components.
An abstract of the paper that accompanies this website
Gene expression profiles obtained from patient-derived complex biopsies can provide insight into the transcriptional changes that underlie the reprogrammed metabolism of cancer cells and tissues. These gene expression profiles, however, represent the average expression pattern of all heterogeneous tumor and non-tumor cells present in biopsies of tumor lesions. This means that subtle transcriptional footprints of metabolic processes can be concealed by the transcriptional footprints of other biological processes and experimental artifacts. In our paper, therefore, we performed consensus Independent Component Analyses (c-ICA) with 34,494 bulk expression profiles of patient-derived tumor biopsies, non-cancer tissues, and cell lines. c-ICA enabled us to create a transcriptional metabolic landscape that consists of a set of statistically independent, but cross-platform robust, 'metabolic Transcriptional Components' (mTCs), and define their activity in every sample. In the manuscript we demonstrate how this landscape can be used to explore associations between the metabolic transcriptome, drug sensitivity and the composition of the immune tumor microenvironment. Every single identified metabolic Transcriptional Component (mTC) and its associations can be investigated through this webportal.
Check out the full publication via the button on the left - it's on bioRxiv!
About the browser
So: this site provides the mTCs that were identified in 4 curated gene expression datasets (patient datasets GEO and TCGA, cell line datasets CCLE and GDSC), which include over 34.000 samples obtained from patient tumor material, normal tissue, and about 2.000 cell lines. As mentioned in the paper, to discern the different subtle gene expression patterns contained in this huge amount of data, the gene expression data is statistically analyzed by applying c-ICA. This decomposed the complex gene expression signals into statistically independent estimated sources of gene expression, which were called Transcriptional Components (TCs). Each of these TCs captures a part of the variance observed in mRNA expression profiling: a transcriptional footprint of some regulatory factor, e.g. a transcription factor, a copy number alteration, or some other biological process. The TC might be active in a specific cancer subtype, or broadly active in several tissue types. In this metabolic browser specifically, we analyzed all TCs to understand which of them capture metabolic processes. This set of TCs are called 'metabolic TCs (mTCs)'. mTCs can capture metabolic processes which prominently present, or subtly coregulated with other biological processes. All mTCs together with their activity in samples formed a metabolic gene expression landscape, so to say. Each gene expression dataset has its own set of mTCs. To determine the robustness of the captured mTCs across datasets of different platforms, they have been correlated in a pairwise manner.
An introduction on how to use this website to get new ideas
Using the gene search mode
The gene search mode gives the possibility to search for a single gene in all of the mTCs. In every mTC, every gene has a certain weight. This gene weight gives information on the importance of that gene in the relevant component. The search result gives a sorted list of these gene weights per mTC. Click "Investigate" to view data on a specific mTC such as the weights of other (coregulated) genes within that component. This could help assigning new metabolic functions to genes that are part of the same transcriptional footprint, but currently not members of known gene sets describing metabolic processes. Gene set enrichment scores and information on the activity score of the component in several tumor types are provided too. NB: Even though genes can be compared with each other within a single mTC, they can not necessarily be compared across mTCs, as every mTC is active to a different extend in every sample. An mTC that contains a very high weight for a certain gene might not be very important in a sample where the mTC has an activity score of near-zero!
Using the tissue type search mode
It is possible to search for a tumor and/or tissue type, or cell line, within the metabolic landscape. Investigating the (average) activity score of samples of a certain tissue type gives a measure of the activity of the mTC in that tissue. Use this browser to find out if an mTC has a high activity score in the samples of your tissue of interest. That mTC might be important or contain very tissue-specific biology! Click "Investigate" to view data on the important genes within that mTC, as well as gene set enrichment scores and information on the activity score of the component in this and several other tumor types.
Using the gene set search mode
One might also be interested in a metabolic process as a whole, instead of just a single gene. Therefore, for every mTC, Gene Set Enrichment Scores have been calculated for several gene set collections. The (absolute) enrichment score of a gene set shows whether the genes that are important in an mTC perchance function in the same biological process. Use this to find out if a biological process of interest might be captured by one of the mTCs. Click "Investigate" to view data on the genes within that component, as well as the enrichment scores of other gene sets and information on the activity score of the component in several tumor types.
The mTC and its data
in the end the investigation will always get down to understanding the biology behind a (set of) mTC(s)
As mentioned, every mTC comes with its own set of data: its gene weights, gene set enrichment scores, tissue activity information ("activity score" of the mTC in samples), and its correlation to mTCs from other datasets. Here, we introduce each of these datasets. First off, the gene weight table contains the transcriptional footprint as captured by the mTC, i.e. the weights of every gene, ordered by their absolute values. This gene weight might be negative or positive. Whether or not this indicates over- or under-expression of the gene in a sample, however, also depends on the sign of the activity score of the TC in that sample (a very negative gene weight in a sample with a negative activity score means that the gene in reality has a high weight in that sample: after all, negative times negative equals positive). To investigate the biology that lies behind the genes of a mTC, one can take a look at the gene set enrichment scores. A high enrichment score for a gene set means that the genes of that gene set are relatively near the top (or bottom) of the complete list of genes. The top and bottom of the 'gene list' contain the genes with the highest weights, thus a gene set with a high enrichment score might give an indication of the biological process that underpins the mTC. Included are the gene sets as defined by Biocarta, KEGG, GO molecular functions, GO biological processes, Reactome and Transcription Factor targets. To investigate the weights of all genes that are member of a gene set, click "show member genes", and a window will pop up with the data on only the genes of that gene set. To view the metabolic landscape of several cancer types, importantly, the "Activity in tissue types"-plot gives information on the activity score of the mTC, for every single sample within the selected dataset: every single dot in the plot corresponds to a single sample or cell line. These activity scores are plotted per tissue (tumor) type, indicating how the mTC is activated in a tissue type as a whole. Higher positive or more negative scores mean that the gene expression pattern found in the mTC is more prominent in that sample. Still, the metabolic transcriptome is heterogeneous. To investigate this, we defined 'metabolic subtypes' based on the activity of every mTC in a sample. This means that every metabolic subtype consists of samples that have a similar activity score profile for the set of mTCs. To investigate these metabolic subtypes, use the tab "Activity in metabolic subtypes".
Associations with immune fractions and drug sensitivities
For every GEO and TCGA sample, we estimated the CIBERSORT fractions for different immune cell types. For every sample, this gives an indication of the relative abundance of immune cells in that biopsy. This fraction of immune cells in a sample can be correlated with the activity of an mTC in a sample. A high correlation might indicate that the respective mTC captures a transcriptional footprint that originates from immune cells from the tumor micro-environment, or captures a footprint that reflects (haematopoietic) tumor-cell characteristics. For every GEO and TCGA mTC, these correlations are given.
Similarly, for every CCLE and GDSC cell line, the IC50 values are available for a large set of drugs. For every cell line sample, this IC50 value gives an indication of the sensitivity (or resistance) of those cells to a certain drug. The IC50 value for a drug in samples can be correlated with the activity of an mTC in those samples. A high correlation might therefore indicate that the respective mTC captures a transcriptional footprint that is important for the sensitivity of a cell to that drug. For every CCLE and GDSC mTC, these correlations are given.
Correlated mTCs from other datasets
Finally, the gene weight pattern of an mTC can be compared to that of mTCs in other datasets. In this way, one can assess the robustness of an identified mTC. Highly correlated components are interesting, as it extends findings from one gene expression dataset to others, meaning that the patterns found are independent of the measurement platform. Furthermore, by comparing components from patient tissue datasets to the cell line compendia (CCLE or GDSC), one might be able to find a cell line that contains a certain gene expression pattern also found in a tumor subtype, or vice versa. Two correlation coefficients are available, calculated through different methods. The overlap in genes with a absolute gene weight > 3 between two mTCs of different datasets is given.
Paper related to this resource, now up on biorxiv
Robust metabolic transcriptional components in 34,494 patient-derived samples and cell lines