I have a series of keywords in an HTML web page - they are comma separated so I could get them to CSV, and would like 开发者_Go百科to know which ones are NOTin another CSV file displayed as an html web page. How would you do that comparison ? I have ideas for mysql and tables but this is CSV or html sources. Thanks !
In Python, given 2 csv files, a.csv and b.csv, this script will create (or edit if it already exists) a new file out.csv that contains everything in a.csv that's not found in b.csv.
import urllib
url = 'http://www.website.com/x.csv'
urllib.urlretrieve(url, 'b.csv')
file_a = open('a.csv', 'r')
file_b = open('b.csv', 'r')
file_out = open('out.csv', 'w')
list_a = [x.strip() for x in file_a.read().split(',')]
list_b = [x.strip() for x in file_b.read().split(',')]
list_out = list(set(list_a) - set(list_b)) # Reverse if necessary
file_out.write(','.join(list_out))
file_out.close()
If it is just a list of keywords, you want to do a search and replace (you can use sed) to replace all the commas with carriage returns. So you will end up with a file containing one keyword on each line. Do that to both versions of the list. Then use the "join" command:
join -v 1 leftfile rightfile
This will report all the entries in leftfile that are not in rightfile. Don't forget to sort the files first, or join won't work. There is a bash tool for sorting too (it's called, not surprisingly, "sort").
PHP solution.. Get keywords as strings, convert then in arrays and use array_diff function:
<?php
$csv1 = 'a1, a2, a3, a4';
$csv2 = 'a1, a4';
$csv1_arr = explode(',', $csv1);
$csv2_arr = explode(',', $csv2);
$diff = array_diff($csv1_arr, $csv2_arr);
print_r($diff);
?>
精彩评论