Footsteps on my way !
perl/linux/测序分析

cufflinks 输入文件排序

cufflinks的输入文件是sambam格式。并且sambam格式的文件必须排好序(官方说需要按参考基因组位置排序)。Tophat的输出结果sambam已经排好了序。如果你用的是其他mapping程序,你可以用以下方法进行排序:

方法1:sort -k 3,3 -k 4,4n hits.sam > hits.sam.sorted  #官方推荐
方法2:sort -nk 4 hits.sam > hits.sam.sorted  #我用过,cufflinks 结果和官方推荐排序后一样
方法3:  samtools sort -n hits.bam  -o hits.bam.sorted  #注意这里要将sam转到bam才能排序(这是按read name 排序,有人用过,好像可以用) 
方法4:  samtools sort hits.bam -o hits.bam,sorted   # 注意这里要将sam转到bam才能排序 (这里是按参考基因组位置排序,应该可以用)

### 所以说 官方所说需要按参考基因组位置排序, 其实按read name 和 基因组位置排序都可以(个人理解)

官方原文:

Cufflinks takes a text file of SAM alignments, or a binary SAM (BAM) file as input. For more details on the SAM format, see the specification. The RNA-Seq read mapper TopHat produces output in this format, and is recommended for use with Cufflinks. However Cufflinks will accept SAM alignments generated by any read mapper. Here’s an example of an alignment Cufflinks will accept:

s6.25mer.txt-913508	16	chr1 4482736 255 14M431N11M * 0 0 \
   CAAGATGCTAGGCAAGTCTTGGAAG IIIIIIIIIIIIIIIIIIIIIIIII NM:i:0 XS:A:-

Note the use of the custom tag XS. This attribute, which must have a value of “+” or “-“, indicates which strand the RNA that produced this read came from. While this tag can be applied to any alignment, including unspliced ones, it must be present for all spliced alignment records (those with a ‘N’ operation in the CIGAR string). The SAM file supplied to Cufflinks must be sorted by reference position. If you aligned your reads with TopHat, your alignments will be properly sorted already. If you used another tool, you may want to make sure they are properly sorted as follows:

sort -k 3,3 -k 4,4n hits.sam > hits.sam.sorted

 

尊重他人劳动成果,转载请注明出处:Bluesky's blog » cufflinks 输入文件排序

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址