Footsteps on my way !
perl/linux/测序分析

[转]DDBJ/EMBL/GenBank Accession的命名规则

20140512104431-338413333The format for GenBank Accession numbers are:

GenBank Accession numbers命名的规则是:

Nucleotide: 1 letter + 5 numerals OR 2 letters + 6 numerals 1个字母+5个数字 或 2个字母+6位数字
Protein: 3 letters + 5 numerals 3个字母+5位数字
WGS: 4 letters + 2 numerals for WGS assembly version + 6-8 numerals 4个字母+2位数字+WGS的版本+6-8位数字
MGA: 5 letters + 7 numerals 5个字母+7位数字

Accession号前缀在各个数据库的分布:

Nucleotide Accession Prefixes (核酸序列的前缀)

PREFIX DATABASE TYPE
BA,DF,DG DDBJ CON division
AN EMBL CON division
CH,CM,DS,EM, EN,EP,EQ,FA, GG,GL NCBI CON division
C,AT,AU,AV,BB, BJ,BP,BW,BY,CI, CJ,DA,DB,DC, DK,FS DDBJ EST
F EMBL EST
H,N,T,R,W,AA,AI, AW,BE,BF,BG, BI,BM,BQ,BU, CA,CB,CD,CF, CK,CN,CO,CV, CX,DN,DR,DT, DV,DY,EB,EC, EE,EG,EH,EL, ES,EV,EW,EX, EY,FC,FD,FE, FF,FG,FK,FL, GD,GE,GH,GO GenBank EST
D,AB DDBJ Direct submissions
V,X,Y,Z,AJ,AM, FM EMBL Direct submissions
U,AF,AY,DQ,EF, EU,FJ,GQ GenBank Direct submissions
AP DDBJ Genome project data
BS DDBJ Chimpanzee genome data
AL,BX,CR,CT, CU EMBL Genome project data
AE,CP,CY GenBank Genome project data
AG,DE,DH,FT DDBJ GSS
B,AQ,AZ,BH,BZ, CC,CE,CG,CL, CW,CZ,DU,DX, ED,EI,EJ,EK, ER,ET,FH,FI GenBank GSS
AK DDBJ cDNA projects
AC,DP GenBank HTGS
E,BD,DD,DI,DJ, DL,DM,FU DDBJ Patents
A,AX,CQ,CS,FB, GM,GN EMBL Patents (nucleotide only)
I,AR,DZ,EA,GC, GP GenBank Patents (nucleotide)
G,BV,GF GenBank STS
BR DDBJ TPA
BN EMBL TPA
EZ GenBank TSA
S GenBank From journal scanning
AD GenBank From GSDB
AH GenBank Segmented set header
AS GenBank Other – not currently being used
BC GenBank MGC project
BK GenBank TPA
BL,GJ,GK GenBank TPA CON division
BT GenBank FLI-cDNA projects
J,K,L,M GenBank from GSDB direct submissions
N GenBank and DDBJ N0-N2 were used intially by both groups but have been removed from circulation, N2-N9 are ESTs
AAAA-AZZZ GenBank WGS
BAAA-BZZZ DDBJ WGS
CAAA-CZZZ EMBL WGS
DAAA-DZZZ GenBank WGS TPA
AAAAA-AZZZZ DDBJ MGA

 

Protein Accession Prefixes (蛋白序列的前缀)

PREFIX DATABASE TYPE
BAA-BZZ DDBJ Protein ID
CAA-CZZ EMBL Protein ID
AAA-AZZ GenBank Protein ID
AAE GenBank Protein ID for Patents (note that there are also some patent proteins with AAA and AAC
FAA_FZZ DDBJ TPA Protein ID
DAA-DZZ GenBank TPA Protein ID
GAA-GZZ DDBJ WGS Protein ID
EAA-EZZ GenBank WGS Protein ID
HAA-HZZ GenBank TPA WGS Protein ID
O Swiss-Prot Protein
P Swiss-Prot (Geneva) Protein
Q Swiss-Prot (Hinxton) Protein

RefSeq的Accessio命名规则请看:http://yangl.net/2015/10/08/ncbi-refseq-name-format/

原文链接:http://www.ncbi.nlm.nih.gov/Sequin/acc.html

尊重他人劳动成果,转载请注明出处:Bluesky's blog » [转]DDBJ/EMBL/GenBank Accession的命名规则

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址