I'm test building a scraping site with django. For some reason the following code is only providing one picture image where i'd like it to print every image, every link, and every price, any help? (also, if you guys know how to place this data into a database model so I don't have to always scrape the site, i'm all ears but that may be another question) Cheers!
Here is the template fil开发者_如何学Goe:
{% extends "base.html" %}
{% block title %}Boats{% endblock %}
{% block content %}
<img src="{{ fetch_boats }}"/>
{% endblock %}
Here is the views.py file:
#views.py
from django.shortcuts import render_to_response
from django.template.loader import get_template
from django.template import Context
from django.http import Http404, HttpResponse
from fetch_images import fetch_imagery
def fetch_it(request):
fi = fetch_imagery()
return render_to_response('fetch_image.html', {'fetch_boats' : fi})
Here is the fetch_images module:
#fetch_images.py
from BeautifulSoup import BeautifulSoup
import re
import urllib2
def fetch_imagery():
response = urllib2.urlopen("http://www.boattrader.com/search-results/Type")
html = response.read()
#create a beautiful soup object
soup = BeautifulSoup(html)
#all boat images have attribute height=165
images = soup.findAll("img",height="165")
for image in images:
return image['src'] #print th url of the image only
# all links to detailed boat information have class lfloat
links = soup.findAll("a", {"class" : "lfloat"})
for link in links:
return link['href']
#print link.string
# all prices are spans and have the class rfloat
prices = soup.findAll("span", { "class" : "rfloat" })
for price in prices:
return price
#print price.string
Lastly, if needed the mapped url in urlconf is below:
from django.conf.urls.defaults import *
from mysite.views import fetch_it
urlpatterns = patterns('', ('^fetch_image/$', fetch_it))
Your fetch_imagery
function needs some work - since you're returning (instead of using yield
), the first return image['src']
will terminate the function call (I'm assuming here that all those returns are part of the same function definition as shown by your code).
Also, my assumption is that you will be returning a list/tuple (or defining a generator method) from fetch_imagery
in which case your template needs to look like:
{% block content %}
{% for image in fetch_boats %}
<img src="{{ image }}" />
{% endfor %}
{% endblock %}
This will basically loop over all items (image urls in your case) in your list and will create img
tags for each one of them.
Out of the scope, but to my mind, scrapping is an excessive cpu time / memory / bandwith consumming, and I think it should be done in a background in asynchronous maneer.
It's a great idea though :)
I dug around on the 'net for quite a while looking for an example for presenting scraped data and this post really helped. There've been some minor changes to the modules since the question was first posted, so I thought I'd bring it up to date and post the code I have with the changes that were needed.
What's nice about this is it gives an example of how to run some Python code in response to traffic, and generate simple content that doesn't have any reason to involve a database or Model classes.
Assuming you have a working Django project that you can add these changes to, you should be able to browse to <your-base-url>/fetch_boats
and see a bunch of boat pictures.
views.py
import django.shortcuts
from django.shortcuts import render
from bs4 import BeautifulSoup
import urllib.request
def fetch_boats(request):
fi = fetch_imagery()
return render(request, "fetch_boats.html", {"boat_images": fi})
def fetch_imagery():
response = urllib.request.urlopen("http://www.boattrader.com")
html = response.read()
soup = BeautifulSoup(html, features="html.parser")
images = soup.findAll("img")
for image in images:
yield image["src"]
urls.py
from django.urls import path
from .views import fetch_boats
urlpatterns = [
path('fetch_boats', fetch_boats, name='fetch_boats'),
]
templates/fetch_boats.html
{% extends 'base.html' %}
{% block title %} ~~~< Boats >~~~ {% endblock title %}
{% block content %}
{% for image in boat_images %}
<br /><br />
<img src="{{ image }}" />
{% endfor %}
{% endblock content %}
精彩评论