开发者

Searching through a website directory, validate, then place URL in a list depending on content

开发者 https://www.devze.com 2023-03-18 10:31 出处:网络
I\'ve been working on a script and I thought I would ask for help. I\'m looking to search a series of websites, check if the site is valid. Then the next step would be to check for specific content on

I've been working on a script and I thought I would ask for help. I'm looking to search a series of websites, check if the site is valid. Then the next step would be to check for specific content on the site. If the site holds that content, place the URL in a list.

import urllib2  

def getPage():  

    url="import urllib2  

National=[]
Local=[]
Sports=[]
Culture=[]

def getPage():  

    url="http://readingeagle.com/section.aspx?id=2"     

    for i in range (0,100,1)
        req = urllib2.Request(http://readingeagle.com/section.aspx?id=,i)
    if "national" in response:  

    response = urllib2.urlopen(req)  

    return response.read()
    for g in range (0,100,1)
    开发者_如何学运维if "national" in response:
        National.append("http://readingeagle.com/section.aspx?id=,g"


# I would like to set-up an iteration to check the 'entryid from 1-100. If the term is found on the page, place the url in the list.

if __name__ == "__main__":  

    namesPage = getPage()  

    print (namesPage) 


Here's my answer to the question of how to validate a given web site.

python check html valid

For checking the context of the page the tools consist of basic string methods, regex, or more sophisticated tools like lxml or beautifulsoup.

matchingSites = []
matchingSites.append(url) #Since you asked. :-p
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号