Footsteps on my way !

# 从SAM文件中判断unique reads

optional fields的格式是TAG:TYPE:VALUE，在SAM文件的第12列。多个optional fields由空格分开。

BWA的结果比Tophat的结果在mapping信息上更为详细。在BWA结果中，”X0″代表打开gap的个数(BWA不支持splice junction，但支持gap)，数据类型为i。”XT”代表mapping类型，数据类型为”A”，即可打印字符，其值可以是Unique/Repeat/N/Mate-sw，有”XT:A:U”就代表这一条是unique read。由于BWA还提供了”AS”(alignment score)和”XS”(suboptimal alignment score)这两项，所以即使有些reads没有被Uniquely mapped，只要AS的值和XS的值相差足够大，也有一定价值。

#############################

We prefer to say an alignment is ‘reliable’ rather than ‘unique’ as ‘uniqueness’ is not well defined in general cases. You can get reliable alignments by setting a threshold on mapping quality:

samtools view -bq 1 aln.bam > aln-reliable.bam


You may want to set a more stringent threshold to get more reliable alignments.

I’d suggest you follow bwa’s FAQ advice and rely more in the MAPQ. Still, if you want to

filter uniquely mapped:

Code:
\$ samtools view bwa.bam | grep "XT:A:U"
===
Unique alignments in bowtie2 have MAPQ>=2, so you can just filter the results by that.
===

10-07-2011, 08:26 AM

If you use BWA, the output bam or sam will have an optional flag = XT:A:U set in case of unique reads. So, you could just write a script to pull those lines containing XT:A:U. Similarly, for tophat there’s an optional NH:i:1 (1 for unique hit).True, but you also want the M reads. Those are the ones where the read wasn’t unique, but the mate was, so that made the mapping of the repetative bit sure. And I’ve unfortunately seen R reads where the mate mapped uniquely. I think those are cases where it should have been changed to M, but it wasn’t.

So I don’t think that pulling from the flags alone will do it. You’ll have to pull down pairs, and check to see if either end maps uniquely.

In my case, if both ends don’t map uniquely, I throw them. If one of the pair maps uniquely (that would have XT:A:U and its mate with XT:A:M), I consider the unique hit as a single-end read and remove the other one alone.

couple questions, 1) what do you mean here? “so that made the mapping of the repetative bit sure”. 2) why do you want the other pair when its not uniquely mapped? (assuming you remove non-unique reads otherwise).

http://sourceforge.net/apps/mediawiki/samtools/index.php?title=SAM_FAQ
http://seqanswers.com/forums/showthread.php?t=8653
http://seqanswers.com/forums/archive/index.php/t-14596.html