I am not sure which module I am supposed to use for this. I have >100 files I need to submit to the following webpage and to retrieve the results.
http://bip.weizmann.ac.il/oca-bin/lpccsu
It would be beneficial if I could automate the process somehow 开发者_Go百科sending the file to the
'<'input type="file" name="filename" size='30''>'
tag, and then receive the returned html so that it can be processed with regular expressions.
Thanks
edit to see an example output, set the radiobutton to CSU, and enter 1eo8 in the 'PDB entry' textbox
@Anake Here are 3 Pythonic packages that provide a solution for retrieving and parsing:
From their websites:
Beautiful Soup parses anything you give it, and does the tree traversal stuff for you. You can tell it "Find all the links", or "Find all the links of class externalLink", or "Find all the links whose urls match "foo.com", or "Find the table heading that's got bold text, then give me that text." 1
Stateful programmatic web browsing in Python, after Andy Lester’s Perl module 2
Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing. 3
There are a few ways to do this:
1) Perl and LWP
use LWP::UserAgent;
my $ua = new LWP::UserAgent;
my $response
= $ua->post('http://bip.weizmann.ac.il/oca-bin/lpccsu?9955',
{ param1 => 'value1',
param2 => 'value2',
});
my $content = $response->content;
// your regular expression code
2) Autohotkey, which has regular expressions and a library written by a user that handles POST requests, see http://www.autohotkey.com/forum/topic33506.html
3) Write a batch file that uses wget --post-data and --post-file, pipe it to a series of files, and read the output with your favortite script language Reference: http://www.gnu.org/software/wget/manual/html_node/HTTP-Options.html
Hope that helps
精彩评论