For the last two weeks I have been kind of stuck on a problem.
I am developing some web scrapers using C# and I am using a WinForms WebBrowser control in my application. I am able to fill up the web form which is opened in my browser and submit it automatically by using the following code:
HtmlElement submitButton = document.GetElementById("Element_ID″);
submitButton.InvokeMember(“click”);
So far everything is fine, but the problem is that there is one another element in the web form that I want to click too, but this element does not have any id or name so I don't know how to click this one.
Please help me as soon as possible I need it for my master thesis.
(I want to click the next page arrow button in the give website: http://www.gelbeseiten.de/yp/11//subscriberlist_pageAction.yp?sessionDataString=H4sIAAAAAAAAAI2PQU8CMRCFfw0XSEmns9128k5KongwGjFeSZftIqILbhcVf70NSgg3X-pbyXjLfvCFpqsbbIMpwbVRRuaBELKm6iew5T4gLFUpdmKpewJAGD8xV7JaxalfpdZX6mP31bH4WQfZblJehXcd2tGvr0WwbunVIKbYIZjjKmoa3atct4RSh-pA/S912oY4qhWzyjJkLvPZV4P4JetNFHYWOG2OoCH4pZlyU-pjWdhjS/LY2sp7-p1lLCLOGXwTLqpT1XSqOiXcpE3Xzw-pncUtGSDNp0ZZwR0we92TxSHjIX0x-pIQM-p0AZuciLl7M/kGE-pmcGjIOsvEpTB-pADJS0suGAQAA&page=0&filterTrade=-&filterFunction=-开发者_高级运维&sortBy=sort_trade&availableLetters=ABCDEFGHIJKLMNOPQRSTUVW )
I've written many web-scrapers in the past using embedded WebBrowsers, so you've come to the right place.
When the element does not have a name you need to find it by either content, or another associated element that is named.
- In the first instance we wrote helper methods to iterate the hierachy looking for a specific piece of content within an element.
- For the second option you get the named element and use a specific index for the desired child.
- A combination of both (find a specific parent then look for a child with the right content)
In your specific example webpage, the next page anchor has a class type of "arrow next"
you can search for.
You could do
HtmlElement next_arrow = document.GetElementsByTagName("a")
.Cast<HtmlElement>()
.Where(e => e.GetAttribute("class") == "arrow next")
.FirstOrDefault();
if (next_arrow != null)
{
next_arrow.InvokeMember("click");
}
Here's a trick, not by InvokeMember("click")
rather just "simulating the click" -
this is the link for the first page:
gelbeseiten.de/yp/11//subscriberlist_pageAction.yp?sessionDataString=H4sIAAAAAAAAAI2PQU8CMRCFfw0XSEmns9128k5KongwGjFeSZftIqILbhcVf70NSgg3X-pbyXjLfvCFpqsbbIMpwbVRRuaBELKm6iew5T4gLFUpdmKpewJAGD8xV7JaxalfpdZX6mP31bH4WQfZblJehXcd2tGvr0WwbunVIKbYIZjjKmoa3atct4RSh-pA/S912oY4qhWzyjJkLvPZV4P4JetNFHYWOG2OoCH4pZlyU-pjWdhjS/LY2sp7-p1lLCLOGXwTLqpT1XSqOiXcpE3Xzw-pncUtGSDNp0ZZwR0we92TxSHjIX0x-pIQM-p0AZuciLl7M/kGE-pmcGjIOsvEpTB-pADJS0suGAQAA&page=0&filterTrade=-&filterFunction=-&sortBy=sort_trade&availableLetters=ABCDEFGHIJKLMNOPQRSTUVW
as you see page=0; clicking next, gives the link -
gelbeseiten.de/yp/11//subscriberlist_pageAction.yp?sessionDataString=H4sIAAAAAAAAAI2PQU/DMAyFf00vmzLFdprE8gkmwTggEENcp3RNxxh0o-pmA8euJBlO1G0-p-pvCf58zNwUzW-pDKyQalSmckExl6DqJpKnPCEuVbDaYFUvBcEIFXgVu1Ws2nV6Xac-pZn89X5xFwoed2MvQbmI73rf1eL4L3SakFFsJOBpnzcJbte9W4hSI-pQ/S912oY4qhWz5LDSC992Dl/QR60ahPki2OZKeNfCgiba18oicmLV8lTcoS8t6BJ8zsHMo3yEU1VE1D1ZmWm7Tt-psXxtNwCMmjS4BhJ7oDAy72WR5CH/MT0l1HQEVa46QDK2Z/JsTyhcdIAWrZeGy8/k7LJ5YQBAAA-e&page=1&filterTrade=-&filterFunction=-&sortBy=sort_trade&availableLetters=ABCDEFGHIJKLMNOPQRSTUVW
now page=1
and so on... in general clicking next means page=(x+1) clicking prev means page=(x-1). so build a string according the requirements. this addresses ur problem, however there are some other data also sent with querystring, that u have to append to the string as well.
精彩评论