开发者

Zillow returns first 7 properties

开发者 https://www.devze.com 2022-12-07 20:54 出处:网络
Trying to make a long story short so I apologize in advance, feel free to ask more questions for clarity.

Trying to make a long story short so I apologize in advance, feel free to ask more questions for clarity. Essentially I am trying to make a web scraping script that takes info from Zillow and puts it into a pandas data frame so that I can learn both pandas and beautifulsoup4 in the process. I am trying to avoid using the Zillow API but it seems it might be my only option. So, when I scrape the location the user inputs, it only returns 7 properties. I was told this is because of the Javascript Zillow uses ("Lazy-loading" or "infinite scrolling".) Basically the other properties aren't loaded until the user scrolls. I tried using selenium instead of requests but I end up getting bot verification captcha'd. I tried using headers and everything but cant seem to figure out a solution other than the API.

Here's my code BEFORE using selenium (aka when it semi-worked):

from bs4 import BeautifulSoup
import pandas as pd
from uszipcode import SearchEngine
import requests, prettify

search = SearchEngine()

zipcode = input("What is your zipcode: ")
zipcode_info = search.by_zipcode(zipcode)

headers = {
    'accept':
    'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'accept-encoding' : 'en-US,en;0.8',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'
}

with requests.Session() as session:
    url= "https://www.zillow.com/homes/for_sale/" + zipcode_info.major_city + "/"
    response = session.get(url, headers=headers)

soup = BeautifulSoup(response.content, 'html.parser')
soup.prettify()
df = pd.DataFrame()

address = list()
price = list()
bed_bath = list()
links = list()

properties = soup.find_all("li", attrs={"class": "ListItem-c11n-8-73-8__sc-10e22w8-0 srp__hpnp3q-0 enEXBq with_constellation"})
for li in properties:
    try:
        address.append(li.find("a", attrs = {"class": "StyledPropertyCardDataArea-c11n-8-73-8__sc-yipmu-0 lhIXlm property-card-link"}).text)
    except:
        pass
    try:
        price.append(li.find("span", attrs = {"data-test": "property-card-price"}).text)
    except:
        pass 
    try:
        span = (li.find("span", attrs = {"class": "StyledPropertyCardHomeDetails-c11n-8-73-8__sc-1mlc4v9-0 jlVIIO"}))
        for subspan in span:
            bed_bath.append(subspan.find("b").text)

    except:
        pass       
    try: 
        links.append( (li.find("a", attrs = {"data-test": "property-card-link"}).get("href"))  )
    except:
        pass

df['Address'] = address
df['Price'] = price
df['Links'] = links
print (df)

And the output is:

                                      Address     Price                                              Links
0        525 W River Dr, Pennsauken, NJ 08110  $259,900  https://www.zillow.com/homedetails/525-W-River...
1     7519 Remington, Merchantville, NJ 08109  $270,000  https://www.zillow.com/homedetails/7519-Reming...
2       2269 Marlon Ave, Pennsauken, NJ 08110  $220,000  https://www.zillow.com/homedetails/2269-Marlon...
3         8129 River Rd, Pennsauken, NJ 08110  $324,999  https://www.zillow.com/homedetails/8129-River-...
4  1653 Springfield Ave, Pennsauken, NJ 08110  $259,900  https://www.zillow.com/homedetails开发者_StackOverflow中文版/1653-Spring...
5      5531 Jackson Ave, Pennsauken, NJ 08110  $265,000  https://www.zillow.com/homedetails/5531-Jackso...
6          8141 Stow Rd, Pennsauken, NJ 08110  $359,000  https://www.zillow.com/homedetails/8141-Stow-R...
7          2203 42nd St, Pennsauken, NJ 08110  $275,000  https://www.zillow.com/homedetails/2203-42nd-S...
0

精彩评论

暂无评论...
验证码 换一张
取 消