triostories.blogg.se -

#USE MMASS TO ANALYZE DATA HOW TO#

Since “each GO term inherits all annotations from its more specific descendants,” results tend to be redundant (except when using MSigDB), as they include directly-related GO terms with a high degree of overlap ( S. In the previous ORA examples, the Hypergeometric test is performed independently for each gene set however, this does not capture the relationship between GO terms (described in Section 8.1.1).

#USE MMASS TO ANALYZE DATA HOW TO#

As for the other columns: GeneRatio is the same as overlap (from the fora results) divided by the cluster size, BgRatio is the set size divided by the universe size, pvalue is the raw p-value, p.adjust is the BH-adjusted p-value, qvalue is the q-value, geneID is the same as overlapGenes from fora, and Count is the overlap size.īelow is an example of how to perform Reactome ORA with ReactomePA::enrichPathway. Unlike the fgsea::fora results, these include the description of each term. # Cluster GO-BP ORA with clusterProfiler package cp_ora % arrange(Cluster, pvalue) %>% head() Instead, use GSEA and summarize the ranking metric at the gene level (take the average). This usually happens when attempting to perform gene-level ORA on protein-level differential analysis results, and can lead to artificial over-representation if genes are counted multiple times. For example, a single feature can not be a member of two or more groups or present multiple times in the same group. ORA can not be used if the input contains duplicates. The associated Hypergeometric p-value is 0.006, and this set would be considered significantly over-represented at the 0.01 level (at least, prior to p-value adjustment) however, if only 2 of the genes in this set are “interesting”, this p-value increases 10-fold to 0.0536 and is no longer significant even at the 0.05 level. 100 of the genes are annotated to a particular gene set, of which 3 are “interesting”. For example, suppose 30 out of 8000 genes are “interesting”. If few genes are in the “interesting” group, ORA may not yield useful or reliable results. It is NOT a good idea to split DEA results by the direction of change and apply ORA to the resulting subsets, unless you are specifically asking “which gene sets are over-represented when we only consider genes that are up- or down-regulated?” (Are the genes in a given set mainly up or down-regulated in one condition relative to another?).

ORA fails to incorporate direction of change. The choice of the threshold for statistical significance and the multiple comparison adjustment method can greatly impact the analysis ( Huang et al., 2009). ORA is not recommended as a follow-up to differential-expression analysis for the reasons below. The probability that 20 or more (up to 100) genes annotated to \(S\) are in cluster \(C\) by chance is given by Of these 100 genes, 20 are members of \(C\). Also suppose that 100 of the 8000 genes are annotated to a particular gene set \(S\). The denominator of the sum is the total number of samples of size \(n\) that can be taken from a population of size \(N\).įor example, suppose we have a list of 8000 genes, of which 400 are members of the same cluster \(C\). The numerator of the sum is the number of samples of \(n\) genes that can be taken from a population of \(N\) genes where exactly \(i\) of the genes are annotated to \(S\) and \(n-i\) are not annotated to \(S\).

In this equation, \(N\) is the number of background genes, \(n\) is the number of “interesting” genes, \(M\) is the number of genes that are annotated to a particular gene set \(S\), and \(x\) is the number of “interesting” genes that are annotated to \(S\).

8.3.2.2 GSEA with clusterProfiler/ ReactomePA.

8.2.4.2 ORA with clusterProfiler/ ReactomePA.

2.1.8 Inference of Parsimonious Protein Set.

2 Isobaric Quantification: Phosphoproteomics.

1.1.6 Inference of Parsimonious Protein Set.

Proteomics Data Analysis in R/Bioconductor.