I'm trying to generate a Position-Weighted Matrix (PWM) in Biopython from Clustalw multipl开发者_高级运维e sequence alignments. I get a "Wrong Alphabet" error every time I do it with gapped alignments. From reading the documentation, I think I need to utilize the Gapped Alphabet to deal with the '-' character in gapped alignments. But when I do this, it still doesn't resolve the error. Does anyone see the problem with this code, or have a better way to generate a PWM from gapped Clustal alignments?
from Bio.Alphabet import Gapped
alignment = AlignIO.read("filename.clustalw", "clustal", alphabet=Gapped)
m = Motif.Motif()
for a in alignment:
m.add_instance(a.seq)
m.pwm()
So you want to use clustal to make these gapped alignments? I use Perl, I see you are using Python, but the logic is basically the same. I use a system call to the clustal executable instead of using BioPerl/Biopython. I believe the clustalw2 executable handles gapped alignments without the need to call an alphabet. Not 100 percent sure, but this is a script I use that works for me. Create a directory with all of your aligments files in it (I use .fasta but you can change the flags on the system call to accept others). This is my Perl script, you must modify the executable path in the last line to match clustal's location on your computer. Hope this helps a bit. As a side note, this is good for making many alignments very quickly, which is what I use it for but if you are only looking to align a few files, might want to skip the whole creating a directory and modify the code to accept a filepath and not a dirpath.
#!/usr/bin/perl
use warnings;
print "Please type the list file name of protein fasta files to align (end the directory path with a / or this will fail!): ";
$directory = <STDIN>;
chomp $directory;
opendir (DIR,$directory) or die $!;
my @file = readdir DIR;
closedir DIR;
my $add="_align.fasta";
foreach $file (@file) {
my $infile = "$directory$file";
(my $fileprefix = $infile) =~ s/\.[^.]+$//;
my $outfile="$fileprefix$add";
system "/Users/Wes/Desktop/eggNOG_files/clustalw-2.1-macosx/clustalw2 -INFILE=$infile -OUTFILE=$outfile -OUTPUT=FASTA -tree";
}
Cheers, Wes
精彩评论