Footsteps on my way !
perl/linux/测序分析

Inparanoid:寻找物种间直系同源基因的软件

简介:

Inparanoid是一个寻找物种间直系同源基因的软件,同时相应的网站目前包含了100 organisms, 1687023 sequences。

Inparanoid单机版程序是一款非常优秀的寻找直系同源基因(orthologs)的工具,目前已经开发到4.1版本,可以在线获取(http://software.sbc.su.se/cgi-bin/request.cgi?project=inparanoid)。

安装:

inparanoid可以直接通过perl inparanoid调用,但需要装好blastall(现今blast+ 未测试)和XML::Parser perl模块

    1.安装blastall

        Ubuntu 下 : sudo apt-get install blast2   (默认安装blast-2.2.26 这是blastall最新版本)

        注意:blastall 需要安装有 -C 参数的版本,据我所知blast-2.2.26版本有,可能其他版本没有如blast-2.2.9

    2.安装XML::Parser       a.下载tar包  XML-Parser-2.41.tar.gz

       b.解压:tar -zxvf XML-Parser-2.14.tar.gz 得到XML-Parser-2.14

       c. 安装XML-Parser-2.14.tar.gz的依赖模块Expat(XML-Parser-2.14 已带):[ 依次运行下列命令 ]

               cd ./XML-Parser-2.14/Expat

               perl Makefile.perl                                                    ###生成Makefile配置文件

               make                                                                        ###编译

               make install                                                            ###安装

可能报错:

Expat.xs:12:19: fatal error: expat.h: 没有那个文件或目录
#include <expat.h>
^
compilation terminated.

解决办法:sudo apt-get install libexpat-dev

        d:安装XML::Parser模块:[ 依次运行下列命令 ]

cd ../                              ###进入   XML-Parser-2.14目录
perl Makefile.perl
make
make install

                安装好后检查XML::parser默认安装目录是非位于perl 的@INC中(使用perl -V 查看@INC),若其目录不在@INC包含之列会报错:

Can’t locate XML/Parser.pm in @INC (you may need to install the XML::Parser module) (@INC contains: /etc/perl /usr/local/lib/perl/5.18.2 /usr/local/share/perl/5.18.2 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.18 /usr/share/perl/5.18 /usr/local/lib/site_perl .) at xmlfilter line 7.

解决办法: 通过在blast_parser.pl中单独指定XML::Parser的路径,如下:

安装XML::Parser模块时:

#######################################################################

yangl@yangl:~/下载/XML-Parser-2.41$ make install
make[1]: 正在进入目录 ‘/home/yangl/下载/XML-Parser-2.41/Expat’
make[1]: 正在离开目录 ‘/home/yangl/下载/XML-Parser-2.41/Expat’
Files found in blib/arch: installing files in blib/lib into architecture dependent library tree
Installing /home/yangl/perl5/lib/perl5/x86_64-linux-gnu-thread-multi/XML/Parser.pm
Installing /home/yangl/perl5/lib/perl5/x86_64-linux-gnu-thread-multi/XML/Parser/LWPExternEnt.pl
Installing /home/yangl/perl5/lib/perl5/x86_64-linux-gnu-thread-multi/XML/Parser/Encodings/iso-8859-4.enc
Installing /home/yangl/perl5/lib/perl5/x86_64-linux-gnu-thread-multi/XML/Parser/Encodings/windows-1250.enc

…….

#######################################################################

@INC为:

########################################################################

yangl@yangl:~/下载/XML-Parser-2.41$ perl -V

……..

@INC:
/etc/perl
/usr/local/lib/perl/5.18.2
/usr/local/share/perl/5.18.2
/usr/lib/perl5
/usr/share/perl5
/usr/lib/perl/5.18
/usr/share/perl/5.18
/usr/local/lib/site_perl

##################################################################

安装目录不在@INC中,所以需要在blast_parser.pl中单独指定XML::Parser的路径:

###################################################################

yangl@yangl:~/下载/inparanoid_4.1$ vi blast_parser.pl

#!/usr/bin/perl
use strict;
use warnings;
#use lib ‘/afs/pdc.kth.se/home/k/krifo/vol03/domainAligner/Inparanoid_new/lib64’;
#use lib ‘/afs/pdc.kth.se/home/k/krifo/vol03/domainAligner/Inparanoid_new/lib64/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi’;
use lib ‘/home/yangl/perl5/lib/perl5/x86_64-linux-gnu-thread-multi’;    ##指定XML::Parser的路径
use XML::Parser;

…….

###################################################################

如果fasta的描述信息过长可能会出现一下问题:

###################################################################

fasta:

>OS01T0100200-01 pep:known chromosome:IRGSP-1.0:1:11218:12435:1 gene:OS01G0100200

transcript:OS01T0100200-01 description:”Note\x3dConserved hypothetical protein., Transcript_evidence\x3dAK059894 (DDBJ, Bes
t hit), ORF_evidence\x3dB8ACR2 (UniProt), NIAS_FLcDNA\x3d006-208-E01,”
MEEAGERDADETHAWSGTASPAALWKTVASSAAMLKLALAMISAAFRTTPFSMSMQLCPN
ATMSLHSPSIFDVVSSITPIMSCIINNRLVAEKAGATMQRWRAHSSPSAMTRPLPNMGMR
LSSYDIVCQLAHLHFSHVCCLV

报错:

Starting second BLAST pass for Oryza_sativa.2M.faa – Oryza_sativa.2M.faa on 2014年 09月 22日 星期一 08:55:26 CST
[formatdb] WARNING: Cannot add sequence number 1 (lcl|1_./tmpd) because it has zero-length.

[formatdb] FATAL ERROR: Fatal error when adding sequence to BLAST database.
[blastall] WARNING: Unable to open tmpd.pin
[blastall] WARNING: Unable to open tmpd.pin
[blastall] WARNING: “: Unable to open tmpd.pin
[blastall] WARNING: “: Unable to open tmpd.pin
[blastall] WARNING: “: Unable to open tmpd.pin
[blastall] WARNING: “: Unable to open tmpd.pin
[blastall] FATAL ERROR: “: Database ./tmpd was not found or does not exist

no element found at line 1, column 0, byte -1 at ./blast_parser.pl line 111.

办法:去掉fasta的描述部分,程序见:faTruncate.pl  .

#! /usr/bin/perl
use strict;
use warnings;
###############################################################
# Author: yangl
# Date: 2014.9.9
# Description: This program is to truncate the description part of fasta file 
###############################################################

my $usage = "\tDescription: This program is to truncate the description part of fasta file.\n\tUsage: $0 \n";
if(!@ARGV){
	print STDERR $usage;
	exit 1;
}

open IN,"<","$ARGV[0]"; open OUT,">","$ARGV[0].truncate";
while(){
	s/^(\>.+?)\s+.*\n$/$1\n/;
	print OUT "$_";
}
close OUT;
=cut
while(){
	if(/^(\>.+?)\s+.*\n$/){
		print "$1\n";
	}else{
		print "dismatch\n";
	}
}
=cut
close IN;
##############################################################

 

运行:perl inparanoid  A.pep B.pep

程序输出:

# Output options: #

$output = 1; # table_stats-format output #

$table = 1; # Print tab-delimited table of orthologs to file “table.txt” #

# Each orthologous group with all inparalogs is on one line #

$mysql_table = 1; # Print out sql tables for the web server #

# Each inparalog is on separate line #

$html = 1; # HTML-format output #

注:部分内容来自 http://www.plob.org/2011/12/11/905.html

尊重他人劳动成果,转载请注明出处:Bluesky's blog » Inparanoid:寻找物种间直系同源基因的软件

分享到:更多 ()

评论 4

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
  1. #1

    你好,我最后出现您文中最后的报错信息,Cannot add sequence number 1 (lcl|1_./tmpd) because it has zero-length
    可以将您的解决问题的程序faTruncate.pl 发给我吗?

    wanghuijun7个月前 (03-19)回复
    • 杨李

      #! /usr/bin/perl
      use strict;
      use warnings;
      ###############################################################
      # Author: yangl
      # Date: 2014.9.9
      # Description: This program is to truncate the description part of fasta file
      ###############################################################

      my $usage = "\tDescription: This program is to truncate the description part of fasta file.\n\tUsage: $0 \n";
      if(!@ARGV){
      print STDERR $usage;
      exit 1;
      }

      open IN,"< ","$ARGV[0]"; open OUT,">","$ARGV[0].truncate";
      while(){
      s/^(\>.+?)\s+.*\n$/$1\n/;
      print OUT "$_";
      }
      close OUT;
      =cut
      while(
      ){
      if(/^(\>.+?)\s+.*\n$/){
      print "$1\n";
      }else{
      print "dismatch\n";
      }
      }
      =cut
      close IN;

      admin7个月前 (03-19)回复
    • 杨李

      我把代码也加在在博客里面了!

      admin7个月前 (03-19)回复
      • 谢谢。 :wink:

        wanghuijun7个月前 (03-19)回复