开发者

Having trouble scraping an ASP .NET web page

开发者 https://www.devze.com 2022-12-24 04:08 出处:网络
I am trying to scrape an ASP.NET website but am having trouble getting the results from a post. I have the following python code and am using httplib2 and BeautifulSoup:

I am trying to scrape an ASP.NET website but am having trouble getting the results from a post. I have the following python code and am using httplib2 and BeautifulSoup:

conn = Http()
# do a get first to retrieve important values
page = conn.request(u"http://somepage.com/Search.aspx", "GET")

#event_validation and viewstate variables retrieved from GET here...

body = {"__EVENTARGUMENT开发者_如何学C" : "",
        "__EVENTTARGET" : "" ,
        "__EVENTVALIDATION": event_validation,
        "__VIEWSTATE" : viewstate,
        "ctl00_ContentPlaceHolder1_GovernmentCheckBox" : "On",
        "ctl00_ContentPlaceHolder1_NonGovernmentCheckBox" : "On",
        "ctl00_ContentPlaceHolder1_SchoolKeyValue" : "",
        "ctl00_ContentPlaceHolder1_SchoolNameTextBox" : "",
        "ctl00_ContentPlaceHolder1_ScriptManager1" : "ctl00_ContentPlaceHolder1_UpdatePanel1|cct100_ContentPlaceHolder1_SearchImageButton",
        "ct100_ContentPlaceHolder1_SearchImageButton.x" : "375",
        "ct100_ContentPlaceHolder1_SearchImageButton.y" : "11",
        "ctl00_ContentPlaceHolder1_SuburbTownTextBox" : "Adelaide,SA,5000",
        "hiddenInputToUpdateATBuffer_CommonToolkitScripts" : 1}

headers = {"Content-type": "application/x-www-form-urlencoded"}
resp, content = conn.request(url,"POST", headers=headers, body=urlencode(body))

When I print content I still seem to be getting the same results as the "GET" or is there a fundamental concept I'm missing to retrieve the result values of an ASP .NET post?


This isn't technically an answer, but you could use Fiddler to examine the difference between what you are sending with your python code, versus what would be sent if you used a web browser to do the post.

I find that usually helps in these types of situations.


Well, You need to see first what you have written in the page for get and post, but I hope you are making sure both requests are sending different contents.

here is how you can do that


if(!IsPostBack)
{
Response.Write("<h1>Get Request</h1>");
}
else
{
Response.Write("<h1>POST Request</h1>");
}

I hope you are using C# as code behind

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号