开发者

Mechanize not recognizing anchor tags via CSS selector methods

开发者 https://www.devze.com 2022-12-19 13:16 出处:网络
(Hope this isn\'t a breach of etiquette: I posted this on RailsForum, but I haven\'t been getting much response from there recently.)

(Hope this isn't a breach of etiquette: I posted this on RailsForum, but I haven't been getting much response from there recently.)

Has anyone else had problems with Mechanize not recognizing anchor tags via CSS selectors?

The HTML looks like this (snippet with white space removed for clarity):

<td class='calendarCell' align='left'>
<a href="http://www.mysite.org/index.php/site/ActivitiesCalendar/2010/02/10/">10</a>
<p style="margin-bottom:15px; line-height:14px; text-align:left;">
<span class="sidenavHeadType">
 Current Events</span><br />
<b><a href="http://www.mysite.org/index.php/site/
Clubs/banks_and_the_fed" class="a2">Banks and the Fed</a></b>
<br />
10:30am- 11:45am
</p>

I'm trying to collect the data from these events. Everything is working except getting the anchor within the <p>. There's clearly an <a> tag inside the <b>, and I'm going to need to follow that link to get further details on this event.

In my rake task, I have:

agent.page.search(".calendarCell,.calendarToday").each do |item|
  day = item.at("a").text

  item.search("p").each do |e|
    anchor   = e.at("a")
    puts anchor
    puts e.inner_html

  end
end

What's interesting is that the item.开发者_如何转开发at("a") always returns the anchor. But the e.at("a") returns nil. And when I do inner_html on the p element, it ignores the anchor entirely. Example output:

nil

<span class="sidenavHeadType">
 Photo Club</span><br><b>Indexing Slide Collections</b>
<br>
2:00pm- 3:00pm

However, when I run the same scrape directly with Nokogiri:

doc.css(".calendarCell,.calendarToday").each do |item|
  day = item.at_css("a").text
  item.css("p").each do |e|
    link     = e.at_css("a")[:href]
    puts e.inner_html
  end
end

It recognizes the inside the

, and it will return the href, etc.

<span class="sidenavHeadType">
 Bridge Party</span><br><b><a href="http://www.mysite.org/index.php/site/Clubs/party_bridge_51209" class="a2">Party Bridge</a></b>
<br>
7:00pm- 9:00pm

Mechanize is supposed to use Nokogiri, so I'm wondering if I have a bad version or if this affects others as well.

Thanks for any leads.


Never mind. False alarm. In my Nokogiri task, I was pointing to a local copy of the page that included the anchors. The live page required a login, so when I browsed to it, I could see the a tags. Adding the login to the rake task solved it.

0

精彩评论

暂无评论...
验证码 换一张
取 消