开发者

A PWM with gapped alignments in Biopython

开发者 https://www.devze.com 2023-03-26 19:45 出处:网络
I\'m trying to generate a Position-Weighted Matrix (PWM) in Biopython from Clustalw multipl开发者_高级运维e sequence alignments.I get a \"Wrong Alphabet\" error every time I do it with gapped alignmen

I'm trying to generate a Position-Weighted Matrix (PWM) in Biopython from Clustalw multipl开发者_高级运维e sequence alignments. I get a "Wrong Alphabet" error every time I do it with gapped alignments. From reading the documentation, I think I need to utilize the Gapped Alphabet to deal with the '-' character in gapped alignments. But when I do this, it still doesn't resolve the error. Does anyone see the problem with this code, or have a better way to generate a PWM from gapped Clustal alignments?

from Bio.Alphabet import Gapped
alignment = AlignIO.read("filename.clustalw", "clustal", alphabet=Gapped)
m = Motif.Motif()
for a in alignment:
    m.add_instance(a.seq)
m.pwm()


So you want to use clustal to make these gapped alignments? I use Perl, I see you are using Python, but the logic is basically the same. I use a system call to the clustal executable instead of using BioPerl/Biopython. I believe the clustalw2 executable handles gapped alignments without the need to call an alphabet. Not 100 percent sure, but this is a script I use that works for me. Create a directory with all of your aligments files in it (I use .fasta but you can change the flags on the system call to accept others). This is my Perl script, you must modify the executable path in the last line to match clustal's location on your computer. Hope this helps a bit. As a side note, this is good for making many alignments very quickly, which is what I use it for but if you are only looking to align a few files, might want to skip the whole creating a directory and modify the code to accept a filepath and not a dirpath.

#!/usr/bin/perl 


use warnings;

print "Please type the list file name of protein fasta files to align (end the directory    path with a / or this will fail!): ";
$directory = <STDIN>;
chomp $directory;

opendir (DIR,$directory) or die $!;

my @file = readdir DIR;
closedir DIR;

my $add="_align.fasta";

foreach $file (@file) {
     my $infile = "$directory$file";
 (my $fileprefix = $infile) =~ s/\.[^.]+$//;
 my $outfile="$fileprefix$add";
 system "/Users/Wes/Desktop/eggNOG_files/clustalw-2.1-macosx/clustalw2 -INFILE=$infile -OUTFILE=$outfile -OUTPUT=FASTA -tree";
}

Cheers, Wes

0

精彩评论

暂无评论...
验证码 换一张
取 消