I have written a simple screen scraping script and at the end of the script I am attempting to create an array of arrays in preparation for an activerecord insert. The structure I am trying to achieve is as follows:
Array b holds a series of 10 element arrays
b = [[0,1,2,3,4,5,6,7,8,9],[0,1,2,3,4,5,6,7,8,9],[0,1,2,3,4,5,6,7,8,9]]
Currently when I try to print out Array b the array is empty. I'm still fairly new to ruby and programming for that matter and would appreciate any feedback on how to get values in array b and to improve the overall script. Script follows:
require "rubygems"
require "celerity"
t = 0
r = 0
c = 0
a = Array.new(10)
b = Array.new
#i开发者_JAVA技巧nitialize Browser
browser = Celerity::IE.new
#goto Login Page
browser.goto('http://www1.drf.com/drfLogin.do?type=membership')
#input UserId and Password
browser.text_field(:name, 'p_full_name').value = 'username'
browser.text_field(:name, 'p_password').value = 'password'
browser.button(:index, 2).click
#goto DRF Frontpage
browser.goto('http://www.drf.com/frontpage')
#goto DRF Entries
browser.goto('http://www1.drf.com/static/indexMenus/eindex.html')
#click the link to access the entries
browser.link(:text, '09').click
browser.tables.each do |table|
t = t + 1
browser.table(:index, t).rows.each do |row|
r = r + 1
browser.table(:index, t).row(:index, r).cells.each do |cell|
a << cell.text
end
b << a
a.clear
end
r = 0
end
puts b
browser.close
This a minor rewrite of your main loop to a more Ruby-like way.
b = Array.new
browser.tables.each_with_index do |table, t|
browser.table(:index, 1 + t).rows.each_with_index do |row, r|
a = Array.new(10)
browser.table(:index, 1 + t).row(:index, 1 + r).cells.each do |cell|
a << cell.text
end
b << a
end
end
puts b
I moved the array initializations to immediately above where they'll be needed. That's a programmer-choice thing of course.
Rather than create two counter variables up above, I switched to using each_with_index
which adds an index variable, starting at 0
. To get your 1-offsets I add 1.
They're not big changes but they add up to a more cohesive app.
Back to the original code: One issue I see with it is that you create your a
array outside the loops then reuse it when you assign to b
. That means that each time the same array gets used, but cleared and values stored to it. That will cause the previous array values to be overwritten, but resulting in duplicated arrays in b
.
require 'pp'
a = []
b = []
puts a.object_id
a[0] = 1
b << a
a.clear
a[0] = 2
b << a
puts
pp b
b.each { |ary| puts ary.object_id }
# >> 2151839900
# >>
# >> [[2], [2]]
# >> 2151839900
# >> 2151839900
Notice that the a
array gets reused repeatedly.
If I change a
to a second array there are two values for b and a is two separate objects:
require 'pp'
a = []
b = []
puts a.object_id
a[0] = 1
b << a
a = []
a[0] = 2
b << a
puts
pp b
b.each { |ary| puts ary.object_id }
# >> 2151839920
# >>
# >> [[1], [2]]
# >> 2151839920
# >> 2151839780
Hopefully that'll help you avoid the problem in the future.
Your problem is there at the end:
b << a # push a *reference to* a onto b
a.clear # clear a; the reference in b now points to an empty array!
If you remove the reference to a.clear and start that loop with:
browser.tables.each do |table|
t = t + 1
a = []
...you'll be golden (at least as far as your array-building goes)
I can't tell from your question whether you have multiple tables or not. Maybe just one? In which case:
b = browser.tables.first.rows.map {|row| row.cells.map(&:text)}
If you have multiple tables, and really want an array (tables) of arrays (rows) of arrays (cells), that would be
b = browser.tables.map {|t| t.rows.map {|row| row.cells.map(&:text)}}
And if the tables all have the same structure and you just want all the rows as if they were in one big table, you can do:
b = browser.tables.map {|t| t.rows.map {|row| row.cells.map(&:text)}}.flatten(1)
精彩评论