HTSeq:一个用于处理高通量数据(High-throughout sequencing)的python包。
HTSeq包有很多功能类,熟悉python脚本的可以自行编写数据处理脚本。
另外,HTSeq也提供了两个脚本文件能够直接处理数据:htseq-qa(检测数据质量)和htseq-count(reads计数)。

用法:htseq-count [options] <alignment_file> <gff_file>

<alignment_file> :

contains the aligned reads in the SAM format.

Make sure to use a splicing-aware aligner such as TopHat.

To read from standard input, use - as <alignment_file>.

{options}

-f <format>--format=<format>

Format of the input data. Possible values are sam (for text SAM files) and bam (for binary BAM files). Default is sam.

-r <order>--order=<order>

For paired-end data, the alignment have to be sorted either by read name or by alignment position. If your data is not sorted, use the samtools sort function of samtools to sort it. Use this option, with name or pos for <order> to indicate how the input data has been sorted. The default is name.

If name is indicated, htseq-count expects all the alignments for the reads of a given read pair to appear in adjacent records in the input data. For pos, this is not expected; rather, read alignments whose mate alignment have not yet been seen are kept in a buffer in memory until the mate is found. While, strictly speaking, the latter will also work with unsorted data, sorting ensures that most alignment mates appear close to each other in the data and hence the buffer is much less likely to overflow.

-s <yes/no/reverse>--stranded=<yes/no/reverse>

whether the data is from a strand-specific assay (default: yes)

For stranded=no, a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. For stranded=yes and single-end reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. For stranded=reverse, these rules are reversed.

If your RNA-Seq data has not been made with a strand-specific protocol, this causes half of the reads to be lost. Hence, make sure to set the option --stranded=no unless you have strand-specific data!

-a <minaqual>--a=<minaqual>

skip all reads with alignment quality lower than the given minimum value (default: 10 — Note: the default used to be 0 until version 0.5.4.)

-t <feature type>--type=<feature type>

feature type (3rd column in GFF file) to be used, all features of other type are ignored (default, suitable for RNA-Seq analysis using an Ensembl GTF file: exon)

-i <id attribute>--idattr=<id attribute>

GFF attribute to be used as feature ID. Several GFF lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identity the counts in the output table. The default, suitable for RNA-Seq analysis using an Ensembl GTF file, is gene_id.

-m <mode>--mode=<mode>

Mode to handle reads overlapping more than one feature. Possible values for <mode> are unionintersection-strict and intersection-nonempty(default: union)

-o <samout>--samout=<samout>

write out all SAM alignment records into an output SAM file called <samout>, annotating each line with its assignment to a feature or a special counter (as an optional field with tag ‘XF’)

-q--quiet

suppress progress report and warnings

最新文章

  1. Python In Action:三、再来一个扩展例子,保证不难
  2. JavaBean基本用法示例(二)
  3. 域名在微信朋友圈内分享需要ICP备案 杜绝不良信息传播
  4. Linux 在一个命令行上执行多个命令
  5. clang: error: linker command failed with exit code 1 (use -v to see invocation)
  6. Linux FTP服务器搭建与使用
  7. postgresql 入门(含java、scala连接代码)
  8. 网站10大致命SEO错误
  9. undefined reference to `sin&amp;#39;问题解决
  10. Javascript中布尔运算符的高级应用
  11. loadrunner动态从mysql取值
  12. Apache ActiveMQ实战(2)-集群
  13. flume安装及入门实例
  14. Anaconda / Conda 实践
  15. [tool] google搜索的正确使用姿势(待补全)
  16. HTML特殊符号(字符实体)大全
  17. 7-通用GPIO
  18. overture里设置踏板标记
  19. beta圆桌2!
  20. js eventLoop (使用chunk 同步变异步)

热门文章

  1. Eclipse 创建 XML 文件
  2. Google Careers 程序员必修课
  3. webservice 使用axis2实现
  4. js 的函数参数的默认值问题
  5. CodeForces 213 E
  6. CentOS7上elasticsearch5.0启动失败
  7. RTLabel 富文本
  8. js内置数据类型
  9. 常用代码块:java使用剪贴板复制文本
  10. 设计线程安全的类 VS 发布线程安全的对象