The format for GenBank Accession numbers are:
GenBank Accession numbers命名的规则是:
Nucleotide: | 1 letter + 5 numerals OR 2 letters + 6 numerals 1个字母+5个数字 或 2个字母+6位数字 |
Protein: | 3 letters + 5 numerals 3个字母+5位数字 |
WGS: | 4 letters + 2 numerals for WGS assembly version + 6-8 numerals 4个字母+2位数字+WGS的版本+6-8位数字 |
MGA: | 5 letters + 7 numerals 5个字母+7位数字 |
Accession号前缀在各个数据库的分布:
Nucleotide Accession Prefixes (核酸序列的前缀)
PREFIX | DATABASE | TYPE | ||
---|---|---|---|---|
BA,DF,DG | DDBJ | CON division | ||
AN | EMBL | CON division | ||
CH,CM,DS,EM, EN,EP,EQ,FA, GG,GL | NCBI | CON division | ||
C,AT,AU,AV,BB, BJ,BP,BW,BY,CI, CJ,DA,DB,DC, DK,FS | DDBJ | EST | ||
F | EMBL | EST | ||
H,N,T,R,W,AA,AI, AW,BE,BF,BG, BI,BM,BQ,BU, CA,CB,CD,CF, CK,CN,CO,CV, CX,DN,DR,DT, DV,DY,EB,EC, EE,EG,EH,EL, ES,EV,EW,EX, EY,FC,FD,FE, FF,FG,FK,FL, GD,GE,GH,GO | GenBank | EST | ||
D,AB | DDBJ | Direct submissions | ||
V,X,Y,Z,AJ,AM, FM | EMBL | Direct submissions | ||
U,AF,AY,DQ,EF, EU,FJ,GQ | GenBank | Direct submissions | ||
AP | DDBJ | Genome project data | ||
BS | DDBJ | Chimpanzee genome data | ||
AL,BX,CR,CT, CU | EMBL | Genome project data | ||
AE,CP,CY | GenBank | Genome project data | ||
AG,DE,DH,FT | DDBJ | GSS | ||
B,AQ,AZ,BH,BZ, CC,CE,CG,CL, CW,CZ,DU,DX, ED,EI,EJ,EK, ER,ET,FH,FI | GenBank | GSS | ||
AK | DDBJ | cDNA projects | ||
AC,DP | GenBank | HTGS | ||
E,BD,DD,DI,DJ, DL,DM,FU | DDBJ | Patents | ||
A,AX,CQ,CS,FB, GM,GN | EMBL | Patents (nucleotide only) | ||
I,AR,DZ,EA,GC, GP | GenBank | Patents (nucleotide) | ||
G,BV,GF | GenBank | STS | ||
BR | DDBJ | TPA | ||
BN | EMBL | TPA | ||
EZ | GenBank | TSA | ||
S | GenBank | From journal scanning | ||
AD | GenBank | From GSDB | ||
AH | GenBank | Segmented set header | ||
AS | GenBank | Other – not currently being used | ||
BC | GenBank | MGC project | ||
BK | GenBank | TPA | ||
BL,GJ,GK | GenBank | TPA CON division | ||
BT | GenBank | FLI-cDNA projects | ||
J,K,L,M | GenBank | from GSDB direct submissions | ||
N | GenBank and DDBJ | N0-N2 were used intially by both groups but have been removed from circulation, N2-N9 are ESTs | ||
AAAA-AZZZ | GenBank | WGS | ||
BAAA-BZZZ | DDBJ | WGS | ||
CAAA-CZZZ | EMBL | WGS | ||
DAAA-DZZZ | GenBank | WGS TPA | ||
AAAAA-AZZZZ | DDBJ | MGA |
Protein Accession Prefixes (蛋白序列的前缀)
PREFIX | DATABASE | TYPE | |
---|---|---|---|
BAA-BZZ | DDBJ | Protein ID | |
CAA-CZZ | EMBL | Protein ID | |
AAA-AZZ | GenBank | Protein ID | |
AAE | GenBank | Protein ID for Patents (note that there are also some patent proteins with AAA and AAC | |
FAA_FZZ | DDBJ | TPA Protein ID | |
DAA-DZZ | GenBank | TPA Protein ID | |
GAA-GZZ | DDBJ | WGS Protein ID | |
EAA-EZZ | GenBank | WGS Protein ID | |
HAA-HZZ | GenBank | TPA WGS Protein ID | |
O | Swiss-Prot | Protein | |
P | Swiss-Prot (Geneva) | Protein | |
Q | Swiss-Prot (Hinxton) | Protein |
RefSeq的Accessio命名规则请看:http://yangl.net/2015/10/08/ncbi-refseq-name-format/
原文链接:http://www.ncbi.nlm.nih.gov/Sequin/acc.html
尊重他人劳动成果,转载请注明出处:Bluesky's blog » [转]DDBJ/EMBL/GenBank Accession的命名规则