开发者

How can I store data items in individual variables from an extracted table in Python?

开发者 https://www.devze.com 2023-02-27 17:05 出处:网络
I have another question about how to collect data from a table.This is an ongoing project and previous answers have been extremely helpful since I\'m pretty new to Python.

I have another question about how to collect data from a table. This is an ongoing project and previous answers have been extremely helpful since I'm pretty new to Python.

I have now successfully extracted a table from html using BeatifulSoup thanks to previous answers to my questions. Now my new problem is storing individual data items in individual variables.

My outputted table looks like this:

year|salary|bonus
2005|100,000|50,000
2006|120,000|80,000

I want to be able to create a variable for salary and one for bonus and include the respective amounts for each year.

Here is my code to get these tables:

from BeautifulSoup import BeautifulSoup
import re

html = '<html><body><p align="center"><table><tr><td>year</td><td>salary</td><td>bonus</td></tr><tr><td>2005</td><td>100,000</td><td>50,000</td></tr><tr><td>2006</td><td>120,000</td><td>80,000</td></tr></table></html>'
soup = BeautifulSoup(html)
table = soup.find('table')
rows = table.findAll('tr')

store=[]

for tr in rows:
    cols = tr.findAll('td')
    row = []
    for td in cols:
        try:
            row.append(''.join(td.find(text=True)))
  开发者_开发技巧      except Exception:
            row.append('')
    store.append('|'.join(row))
print '\n'.join(store)

Is there a way to create variables to extract salary and bonus for each year?


Do you mean store rather than just printing each row? You can put them into a dictionary; assuming the columns are always in that order and are always specified you can use something like:

payment_dict = {}
for tr in rows:
  year_td, salary_td, bonus_td = tr.findAll('td')
  salary = salary_td.find(text=True)
  bonus = bonus_td.find(text=True)
  payment_dict[year_td.find(text=True)] = {'salary': salary, 'bonus': bonus}

# payment_dict['2005']['bonus'] = '50,000'

If there are multiple lines per year you'll have to make each year's value a list.

0

精彩评论

暂无评论...
验证码 换一张
取 消