I have another question about how to collect data from a table. This is an ongoing project and previous answers have been extremely helpful since I'm pretty new to Python.
I have now successfully extracted a table from html using BeatifulSoup thanks to previous answers to my questions. Now my new problem is storing individual data items in individual variables.
My outputted table looks like this:
year|salary|bonus
2005|100,000|50,000
2006|120,000|80,000
I want to be able to create a variable for salary and one for bonus and include the respective amounts for each year.
Here is my code to get these tables:
from BeautifulSoup import BeautifulSoup
import re
html = '<html><body><p align="center"><table><tr><td>year</td><td>salary</td><td>bonus</td></tr><tr><td>2005</td><td>100,000</td><td>50,000</td></tr><tr><td>2006</td><td>120,000</td><td>80,000</td></tr></table></html>'
soup = BeautifulSoup(html)
table = soup.find('table')
rows = table.findAll('tr')
store=[]
for tr in rows:
cols = tr.findAll('td')
row = []
for td in cols:
try:
row.append(''.join(td.find(text=True)))
开发者_开发技巧 except Exception:
row.append('')
store.append('|'.join(row))
print '\n'.join(store)
Is there a way to create variables to extract salary and bonus for each year?
Do you mean store rather than just printing each row? You can put them into a dictionary; assuming the columns are always in that order and are always specified you can use something like:
payment_dict = {}
for tr in rows:
year_td, salary_td, bonus_td = tr.findAll('td')
salary = salary_td.find(text=True)
bonus = bonus_td.find(text=True)
payment_dict[year_td.find(text=True)] = {'salary': salary, 'bonus': bonus}
# payment_dict['2005']['bonus'] = '50,000'
If there are multiple lines per year you'll have to make each year's value a list.
精彩评论