有一篇文章“Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data”,提到了“Our results demonstrate that the original Affymetrix probe set definitions are inaccurate, and many conclusions derived from past GeneChip analyses 40 may be significantly flawed. It will be beneficial to re-analyze existing GeneChip data with updated probe set definitions”,意思是说原始的Affymetrix探针组定义是不准确、过时的,要重定义探针组(Custom CDF)。

重定义的格式包括:

名称

含义

ENTREZG

REFSEQ

ENSG

ENSE

ENST

VEGAG

VEGAE

VEGAT

TAIRG

TAIRT

UG

MIRBASEF

MIRBASEG

物种包括:

名称

含义

简写

Anopheles gambiae

冈比亚疟蚊

Ag

Arabidopsis thaliana

拟南芥

At

Bos taurus

Bt

Caenorhabditis elegans

秀丽隐杆线虫

Ce

Canis familiaris

犬类

Cf

Danio rerio

鲐鱼类

Dr

Drosophila melanogaster

黑腹果蝇

Dm

Gallus gallus

原鸡

Gg

Homo sapiens

人类

Hs

Macaca mulatta

猕猴

MAmu

Mus musculus

小家鼠

Mm

Oryza sativa

Os

Rattus norvegicus

鼠类

Rn

Saccharomyces cerevisiae

酿酒酵母

Sc

Schizosaccharomyces pombe

粟酒裂殖酵母

Sp

Sus scrofa

野猪

Ss

在“02、CDF文件”中提到过,每种型号的芯片都有着对应CDF包,那么重定义后CDF的名称命名如下(加号代表连接,不要写进去):

原CDF包名+物种简写小写+格式名小写

如HG-U133_Plus_2阵列原本对应hgu133plus2包,如果选择了Homo sapiens(人类物种),ENSG格式,那么就对应hgu133plus2hsensgcdf包了。

获取hgu133plus2的探针组名称,可以:

library(affy)     ## 导入affy包
cdfname <- "hgu133plus2"
cdfname <-"hgu133plus2hsensgcdf"
how = getOption("BioC")$affy$probesloc
verbose = FALSE
badOut <- list()
for (i in :length(how))
{
cur <- how[[i]]
envir <- switch(cur$what,
environment = cdfFromEnvironment(cdfname, cur$where, verbose),
libPath = cdfFromLibPath(cdfname, cur$where, verbose = verbose),
bioC = cdfFromBioC(cdfname, cur$where, verbose))
}
genenames <- ls(envir) ## 探针组名称
> length(genenames) ## 探针组个数
[]
> genenames[:] ## 输出前100个探针组名称
[] "1007_s_at" "1053_at" "117_at" "121_at"
[] "1255_g_at" "1294_at" "1316_at" "1320_at"
[] "1405_i_at" "1431_at" "1438_at" "1487_at"
[] "1494_f_at" "1552256_a_at" "1552257_a_at" "1552258_at"
[] "1552261_at" "1552263_at" "1552264_a_at" "1552266_at"
[] "1552269_at" "1552271_at" "1552272_a_at" "1552274_at"
[] "1552275_s_at" "1552276_a_at" "1552277_a_at" "1552278_a_at"
[] "1552279_a_at" "1552280_at" "1552281_at" "1552283_s_at"
[] "1552286_at" "1552287_s_at" "1552288_at" "1552289_a_at"
[] "1552291_at" "1552293_at" "1552295_a_at" "1552296_at"
[] "1552299_at" "1552301_a_at" "1552302_at" "1552303_a_at"
[] "1552304_at" "1552306_at" "1552307_a_at" "1552309_a_at"
[] "1552310_at" "1552311_a_at" "1552312_a_at" "1552314_a_at"
[] "1552315_at" "1552316_a_at" "1552318_at" "1552319_a_at"
[] "1552320_a_at" "1552321_a_at" "1552322_at" "1552323_s_at"
[] "1552325_at" "1552326_a_at" "1552327_at" "1552329_at"
[] "1552330_at" "1552332_at" "1552334_at" "1552335_at"
[] "1552337_s_at" "1552338_at" "1552340_at" "1552343_s_at"
[] "1552344_s_at" "1552347_at" "1552348_at" "1552349_a_at"
[] "1552354_at" "1552355_s_at" "1552359_at" "1552360_a_at"
[] "1552362_a_at" "1552364_s_at" "1552365_at" "1552367_a_at"
[] "1552368_at" "1552370_at" "1552372_at" "1552373_s_at"
[] "1552375_at" "1552377_s_at" "1552378_s_at" "1552379_at"
[] "1552381_at" "1552383_at" "1552384_a_at" "1552386_at"
[] "1552388_at" "1552389_at" "1552390_a_at" "1552391_at"

获取hgu133aplus2_Hs_ENSG的探针组名称,把上面的cdfname换成"hgu133plus2hsensgcdf"即可,最后得到以下结果:

> length(genenames)    ## 重定义后的探针组数目
[]
> genenames[:] ## 重定义后的探针组名称的前100个
[] "AFFX-BioB-3_at" "AFFX-BioB-5_at"
[] "AFFX-BioB-M_at" "AFFX-BioC-3_at"
[] "AFFX-BioC-5_at" "AFFX-BioDn-3_at"
[] "AFFX-BioDn-5_at" "AFFX-CreX-3_at"
[] "AFFX-CreX-5_at" "AFFX-DapX-3_at"
[] "AFFX-DapX-5_at" "AFFX-DapX-M_at"
[] "AFFX-HSAC07/X00351_3_at" "AFFX-HSAC07/X00351_5_at"
[] "AFFX-HSAC07/X00351_M_at" "AFFX-hum_alu_at"
[] "AFFX-HUMGAPDH/M33197_3_at" "AFFX-HUMGAPDH/M33197_5_at"
[] "AFFX-HUMGAPDH/M33197_M_at" "AFFX-HUMISGF3A/M97935_3_at"
[] "AFFX-HUMISGF3A/M97935_5_at" "AFFX-HUMISGF3A/M97935_MA_at"
[] "AFFX-HUMISGF3A/M97935_MB_at" "AFFX-HUMRGE/M10098_3_at"
[] "AFFX-HUMRGE/M10098_5_at" "AFFX-HUMRGE/M10098_M_at"
[] "AFFX-LysX-3_at" "AFFX-LysX-5_at"
[] "AFFX-LysX-M_at" "AFFX-M27830_3_at"
[] "AFFX-M27830_5_at" "AFFX-M27830_M_at"
[] "AFFX-PheX-3_at" "AFFX-PheX-5_at"
[] "AFFX-PheX-M_at" "AFFX-r2-Bs-dap-3_at"
[] "AFFX-r2-Bs-dap-5_at" "AFFX-r2-Bs-dap-M_at"
[] "AFFX-r2-Bs-lys-3_at" "AFFX-r2-Bs-lys-5_at"
[] "AFFX-r2-Bs-lys-M_at" "AFFX-r2-Bs-phe-3_at"
[] "AFFX-r2-Bs-phe-5_at" "AFFX-r2-Bs-phe-M_at"
[] "AFFX-r2-Bs-thr-3_s_at" "AFFX-r2-Bs-thr-5_s_at"
[] "AFFX-r2-Bs-thr-M_s_at" "AFFX-r2-Ec-bioB-3_at"
[] "AFFX-r2-Ec-bioB-5_at" "AFFX-r2-Ec-bioB-M_at"
[] "AFFX-r2-Ec-bioC-3_at" "AFFX-r2-Ec-bioC-5_at"
[] "AFFX-r2-Ec-bioD-3_at" "AFFX-r2-Ec-bioD-5_at"
[] "AFFX-r2-P1-cre-3_at" "AFFX-r2-P1-cre-5_at"
[] "AFFX-ThrX-3_at" "AFFX-ThrX-5_at"
[] "AFFX-ThrX-M_at" "AFFX-TrpnX-3_at"
[] "AFFX-TrpnX-5_at" "AFFX-TrpnX-M_at"
[] "ENSG00000000003_at" "ENSG00000000005_at"
[] "ENSG00000000419_at" "ENSG00000000457_at"
[] "ENSG00000000460_at" "ENSG00000000938_at"
[] "ENSG00000000971_at" "ENSG00000001036_at"
[] "ENSG00000001084_at" "ENSG00000001167_at"
[] "ENSG00000001460_at" "ENSG00000001461_at"
[] "ENSG00000001497_at" "ENSG00000001561_at"
[] "ENSG00000001617_at" "ENSG00000001626_at"
[] "ENSG00000001629_at" "ENSG00000001631_at"
[] "ENSG00000002016_at" "ENSG00000002079_at"
[] "ENSG00000002330_at" "ENSG00000002549_at"
[] "ENSG00000002586_at" "ENSG00000002587_at"
[] "ENSG00000002726_at" "ENSG00000002745_at"
[] "ENSG00000002746_at" "ENSG00000002822_at"
[] "ENSG00000002834_at" "ENSG00000002919_at"
[] "ENSG00000002933_at" "ENSG00000003056_at"
[] "ENSG00000003096_at" "ENSG00000003137_at"
[] "ENSG00000003147_at" "ENSG00000003249_at"
[] "ENSG00000003393_at" "ENSG00000003400_at"

从结果可以看出,hgu133plus2有54675个探针组,而hgu133plus2hsensgcdf只有20009个探针组,这是因为有些探针组被合并起来了,可能还有一些被舍弃掉。

最新文章

  1. myeclipse2014新建maven项目
  2. Leetcode5:Longest Palindromic Substring@Python
  3. throw er; Unhandled &#39;error&#39; event Error: listen EADDRINUSE的解决方法
  4. json+servlet+ajax
  5. Jmeter报告优化之New XSL stylesheet
  6. 分享一些WinForm数据库连接界面UI
  7. Arcengine10下载地址
  8. angularjs 1.3 综合学习 (one way bind , ng-if , ng-switch , ng-messages, ng-form ,ng-model )
  9. CDOJ UESTC 1220 The Battle of Guandu
  10. 使用Dom解析器,操作XML里面的信息
  11. 熟悉java语言的基本使用:简单存款取款机制java实现
  12. 使用VirtualBox调试项目踩过的坑
  13. SQl 执行效率总结
  14. python骚操作之...
  15. outlook2016用Exchange轻松绑定腾讯企业邮箱
  16. 揭示牌面使之升序 Reveal Cards In Increasing Order
  17. VueJs学习参考的例子
  18. 给web项目整合富文本编辑器
  19. 使SourceInsight支持Python语言的方法
  20. ElasticSearch(二) 关于DSL

热门文章

  1. python虚环境
  2. 【转】一款开源免费跨浏览器的视频播放器--videojs使用介绍
  3. python3笔记十五:python函数
  4. Redis数据类型,面试相关
  5. Android各种键盘挡住输入框解决办法
  6. 请描述一下 BroadcastReceiver?
  7. 通过Precision/Recall判断分类结果偏差极大时算法的性能
  8. [Flask]使用sqlite数据库
  9. 如何利用Nginx的缓冲、缓存优化提升性能
  10. Python学习之==&gt;面向对象编程(二)