Could not scrape data in English, help!_问答_开发者

开发者 https://www.devze.com 2023-03-12 02:10 出处：网络

I have a website that I\'m trying to scrape using Python & BeautifulSoup. The site itself can be viewed in 2 languages(Thai or English); all you have to do is to click on either the Thai or UK fla

相关专题：python

I have a website that I'm trying to scrape using Python & BeautifulSoup. The site itself can be viewed in 2 languages(Thai or English); all you have to do is to click on either the Thai or UK flag on the upper right corner of the screen and the data is displayed in the selected language. When in comes to the script though, I can only scrape the data in Thai开发者_Go百科 (which is the default language) and I couldn't figure out how to get the data in English because the URL doesn't change when you click on either the Thai or UK flag. Looking at the source for the page, there are no href associated with either flag. I turned on Firebug tracing and tried to search for something to give me a clue but haven't found anything (then again you'd have to know exactly what to look for in order to know what's going on and that's my problem).

Thanks, Glenn

You haven't said what the site is so impossible to answer for sure. But a couple of suggestions. If the url does not change when you click the flag, then either:

a) The english is already in the html document, and the relevant content is being switched with javascript b) The english content is being fetched via an ajax request and javascript is being used to edit the DOM c) The page fully reloads with english content.

Presumably in all these cases the language preference must be stored either server-side in the session or client-side with cookies.

First tests are try turning off cookies and javascript to see what happens. Then with cookies, js back on use Firebug or Firefox to view network requests being made.

Here's the cookie:

Cookie  verify=test; LangName=th; ASP.NET_SessionId=ylulkp45qpjq2b453nurlp55; _cbclose=1; _cbclose30246=1; _uid30246=66B70BE9.1; _ctout30246=1

If you change the language, it sets the LangName=en.

urllib2 can used in conjunction with cookielib to enable storing and reusing cookies.

Could not scrape data in English, help!

精彩评论

关注公众号

热门标签

图文推荐

Could not scrape data in English, help!

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：