java -cp /stor1/shah/Ruben_peptide/gsea_analysis/gsea2-2.06.jar
-Xmx2000m xtools.gsea.Gsea
-res foo_expression_values.gct
-cls foo_expression_values.cls#Normal_versus_Treatment
-gmx msigdb.v2.5.symbols.gmt -chip HG_U133_Plus_2.chip
-collapse true -mode Max_probe -norm meandiv -nperm 1000
-permute phenotype -rnd_type no_balance -scoring_scheme weighted
-rpt_label my_analysis -metric Signal2Noise -sort real -order descending -include_only_symbols true -make_sets true -median false -num 100
-plot_top_x 20 -rnd_seed timestamp -save_rnd_lists false -set_max 500 -set_min 15 -zip_report false -out /stor1/shah/Ruben_peptide/gsea_analysis -gui false
commandline for GSEA.
Tuesday, July 20, 2010
Monday, July 19, 2010
Significance of overlapping gene lists
Wen Fury and Wentian Li
http://www.nslij-genetics.org/wli/pub/ieee-embs06.pdf
To identify significance of overlap for two differentially expressed gene sets n1 and n2 (e.g. d1-n1 and d2-n1) use either hypergeometric or Fisher's exact test p-value.
Given integers n, n1, n2, m (max(n1,n2) <= n and m <= min (n1,n2)), the hypergeometric distribution is defined as
P(m) = [C(n1, m) * C (n - n1, n2 -m)]/ C (n, n2)
where C(n,m) is the number of possibilities of choosing m objects out of n objects : C (n,m) = n!/[m! (n -m)!]
It is usually more interesting to calculate the sum of P(m) for m's equal or larger than the observed value (i.e. p-value) :
p-value = Sigma [k= m to min (n1,n2)] p(k)
= Sigma [k = 0 to min (n1,n2)] p(k) - Sigma [k = 0 to m - 1] p(k)
For calculating it in R use :
if m = 0, p-value = 1
phyper (m, n1, n - n1, n2):
p-value = phyper(min(n1,n2), n1, n-n1, n2) - phyper(m-1, n1, n-n1, n2) if m > 0
One can also use Fisher's exact test on the following 2-by-2 table:
col1 col2 total
row1 m n1-m n1
row2 n2-m n-n1-n2+m n - n1
total n2 n-n2 n
They produce identical results.
http://www.nslij-genetics.org/wli/pub/ieee-embs06.pdf
To identify significance of overlap for two differentially expressed gene sets n1 and n2 (e.g. d1-n1 and d2-n1) use either hypergeometric or Fisher's exact test p-value.
Given integers n, n1, n2, m (max(n1,n2) <= n and m <= min (n1,n2)), the hypergeometric distribution is defined as
P(m) = [C(n1, m) * C (n - n1, n2 -m)]/ C (n, n2)
where C(n,m) is the number of possibilities of choosing m objects out of n objects : C (n,m) = n!/[m! (n -m)!]
It is usually more interesting to calculate the sum of P(m) for m's equal or larger than the observed value (i.e. p-value) :
p-value = Sigma [k= m to min (n1,n2)] p(k)
= Sigma [k = 0 to min (n1,n2)] p(k) - Sigma [k = 0 to m - 1] p(k)
For calculating it in R use :
if m = 0, p-value = 1
phyper (m, n1, n - n1, n2):
p-value = phyper(min(n1,n2), n1, n-n1, n2) - phyper(m-1, n1, n-n1, n2) if m > 0
One can also use Fisher's exact test on the following 2-by-2 table:
col1 col2 total
row1 m n1-m n1
row2 n2-m n-n1-n2+m n - n1
total n2 n-n2 n
They produce identical results.
Thursday, July 08, 2010
From a logical matrix to numerical matrix
Que : From a matrix of TRUE/FALSE get a matrix of 0 and 1
Ans : Multiply the logical matrix by self
Ans : Multiply the logical matrix by self
Subscribe to:
Posts (Atom)