Tuesday, July 20, 2010

Running Geneset enrichment analysis on commandline

java -cp /stor1/shah/Ruben_peptide/gsea_analysis/gsea2-2.06.jar
-Xmx2000m xtools.gsea.Gsea
-res foo_expression_values.gct
-cls foo_expression_values.cls#Normal_versus_Treatment
-gmx msigdb.v2.5.symbols.gmt -chip HG_U133_Plus_2.chip
-collapse true -mode Max_probe -norm meandiv -nperm 1000
-permute phenotype -rnd_type no_balance -scoring_scheme weighted
-rpt_label my_analysis -metric Signal2Noise -sort real -order descending -include_only_symbols true -make_sets true -median false -num 100
-plot_top_x 20 -rnd_seed timestamp -save_rnd_lists false -set_max 500 -set_min 15 -zip_report false -out /stor1/shah/Ruben_peptide/gsea_analysis -gui false

commandline for GSEA.

Monday, July 19, 2010

Significance of overlapping gene lists

Wen Fury and Wentian Li
http://www.nslij-genetics.org/wli/pub/ieee-embs06.pdf

To identify significance of overlap for two differentially expressed gene sets n1 and n2 (e.g. d1-n1 and d2-n1) use either hypergeometric or Fisher's exact test p-value.

Given integers n, n1, n2, m (max(n1,n2) <= n and m <= min (n1,n2)), the hypergeometric distribution is defined as

P(m) = [C(n1, m) * C (n - n1, n2 -m)]/ C (n, n2)

where C(n,m) is the number of possibilities of choosing m objects out of n objects : C (n,m) = n!/[m! (n -m)!]

It is usually more interesting to calculate the sum of P(m) for m's equal or larger than the observed value (i.e. p-value) :


p-value = Sigma [k= m to min (n1,n2)] p(k)
= Sigma [k = 0 to min (n1,n2)] p(k) - Sigma [k = 0 to m - 1] p(k)

For calculating it in R use :

if m = 0, p-value = 1

phyper (m, n1, n - n1, n2):
p-value = phyper(min(n1,n2), n1, n-n1, n2) - phyper(m-1, n1, n-n1, n2) if m > 0

One can also use Fisher's exact test on the following 2-by-2 table:

col1 col2 total
row1 m n1-m n1
row2 n2-m n-n1-n2+m n - n1
total n2 n-n2 n

They produce identical results.

Thursday, July 08, 2010

From a logical matrix to numerical matrix

Que : From a matrix of TRUE/FALSE get a matrix of 0 and 1
Ans : Multiply the logical matrix by self