开发者

What is the easiest way to compare two web pages using python?

开发者 https://www.devze.com 2023-02-15 08:48 出处:网络
Hello I want 开发者_StackOverflowto Compare two webpages using python script. how can i achieve it? thanks in advance!First, you want to retrieve both webpages. You can use wget, urlretrieve, etc.:

Hello I want 开发者_StackOverflowto Compare two webpages using python script. how can i achieve it? thanks in advance!


First, you want to retrieve both webpages. You can use wget, urlretrieve, etc.:
wget Vs urlretrieve of python

Second, you want to "compare" the pages. You can use a "diff" tool as Chinmay noted. You can also do a keyword analysis of the two pages:

  1. Parse all keywords from page. e.g. How do I extract keywords used in text?
  2. Optionally take the "stem" of the words with something like:
    http://pypi.python.org/pypi/stemming/1.0
  3. Use some math to compare the two pages' keywords, e.g. term frequency–inverse document frequency: http://en.wikipedia.org/wiki/Tf%E2%80%93idf with some of the python tools out there like these: http://wiki.python.org/moin/InformationRetrieval


What do you mean by compare? If you just want to find the differences between two files, try difflib, which is part of the standard Python library.

0

精彩评论

暂无评论...
验证码 换一张
取 消