开发者

Ruby script to download private google docs

开发者 https://www.devze.com 2023-02-10 13:06 出处:网络
I would like to write a script in Ruby (using the gdata gem, rest-client gem or just straight Net::HTTP) to authenticate with my google docs using gmail-userid/password, and then download a list of pr

I would like to write a script in Ruby (using the gdata gem, rest-client gem or just straight Net::HTTP) to authenticate with my google docs using gmail-userid/password, and then download a list of private documents and documents.

The GData documents guide makes it clear how to get publicly visible documents, but it's not clear how I can authenticate myself in my script to get access to private documents. The authentication methods they specify all seem to require hum开发者_运维百科an intervention, either with a Capcha or some form of OAuth/OpenID redirection.

Is there some way to access my private documents with just a userid/password combination? Or perhaps that along with an API key? If so, can anybody show me how to do this?


So, sometimes giving up, moving on to something else, and coming back with a fresh mindset can do wonders. I started looking at this again this morning and within a couple of hours got it working.

I ditched OAuth because the Ruby OAuth gem seems to be centered around web based applications. I started poking around in Google Data on Rails, and using ClientLogin, it was no problem getting authenticated, and as far as I can tell, you don't get CAPTCHA requests unless you enter in the wrong credentials... or at least I haven't otherwise seen any yet.

Here is a simple code snippet to export a spreadsheet file:

require 'gdata/client'  
require 'gdata/http'  
require 'gdata/auth'  
client = GData::Client::Spreadsheets.new  
client.clientlogin('username', 'password')  
test = client.get("http://spreadsheets.google.com/feeds/download/spreadsheets/Export?key="resource_ID"&fmcmd&exportFormat=xls")  
file = File.new("spreadsheet.xls", "wb")  
file.write test.body  
file.close  


I started on this exact same project today and have run into the same issue. I've managed to get around using OAuth or OpenID, but still working on actually getting a file downloaded... which seems like it should be the easy part. Anyway, here's what I've done:

I'm using the Mechanize gem to scrape the the docs.google.com page for the username and password forms. I submit my credentials via Mechanize and now have access to my Google docs.

At this point I find I can use the download URL mentioned in this Google documentation:

http://code.google.com/apis/documents/docs/3.0/developers_guide_protocol.html#DownloadingDocs

The URL looks like this (I'm working with spreadsheets):

"http://spreadsheets.google.com/feeds/download/spreadsheets/Export?key="resource_id_goes_here"&exportFormat=xls"

For tinkering/testing, I'm just taking the resource id of my spreadsheet from the address bar of my web browser (when I have the spreadsheet open in my browser) and plugging it into the above URL in another tab of my browser. This seems to work because when I submit the URL the spreadsheet is downloaded as an .xls file. Note this is all using my web browser.

I haven't been able to successfully initiate a download via my Ruby script. That URL isn't a a direct link to the file so I'm not quite sure how to properly capture the file data. The script runs successfully, but if I store the output of the Ruby 'get' method (which is using that URL as an argument) in an object, it appears to be some javascript redirection stuff. I'm probably overlooking something obvious, but that's where I'm at. I blame being stuck on the hours I spent reading about OAuth and OpenID... that wasn't much fun.

Hopefully some of that is useful. Here's another interesting Ruby gem I came across in my research on the authentication stuff:

OAuth Ruby Gem: http://oauth.rubyforge.org/


Sure, here's a basic version of what I'm doing:

require 'mechanize'  
agent = Mechanize.new  
page = agent.get "https://docs.google.com"  
form = page.forms.first  
form.Email = "your_username"  
form.Passwd = "your_password"   
page = agent.submit form      
test = agent.get "google_download_url_goes_here"  
puts test.body

If you look at test you'll see the Java redirection stuff instead of the xls file.

I haven't worked on it in a couple of days, but I have a slight feeling I'm getting the redirection because the script isn't "properly" authenticated. Mechanize is supposed to handle cookies and redirects so I would think this should simply just work, but its not.

UPDATE:

The export URL's are a little farther down on the same page in that documentation you linked to in your comment. The URL for exporting a spreadsheet looks like this:

http://spreadsheets.google.com/feeds/download/spreadsheets/Export?key="document_resource_id_goes_here"&exportFormat=xls

You should be able to plug that into a browser and download a file (if you are logged in, of course). The document resource id is just the unique key for whatever document you are working with, you can manually paste it into the URL for testing in a browser.

However, I'm pretty sure none of these API URL's will work in a script unless it is properly handling authentication the way Google is asking for. I'm not exactly sure what I'm looking at, but using Wireshark to sniff packets I can see some errors when using a script that I don't get when using my browser. These errors seem to occur when the server and script are exchanging some kind of certificate info. Anyway, I've been looking at the OAuth gem some more and think I am starting to understand it better.

If you go here:

http://googlecodesamples.com/oauth_playground/

You can play around with the OAuth stuff, it's kind of crazy how it works. You ask for a request token with a bunch of parameters that must be 'just' right. It sends the request token which you then use to reference a login page where you enter your Google credentials (as you would when you manually work with Google docs). Once your credentials are verified it asks you to grant permission to the request token. The request token is upgraded to an access token and then passed back to your script and you can then start working with the rest of the API by referencing this access token... seems like overkill, but I'm no security expert.

Here's what I'm hoping to do:

  • Figure out how to use the OAuth Ruby gem to request and send tokens to Google.

  • Use Mechanize to scrape the Google login page and enter credentials once I can send it the request token it wants

  • Use Mechanize to click on the "Grant Access" button once my credentials are submitted

  • Then hopefully find that I can actually use the rest of the API to work with files

(Grrr! learning how to properly format text on this website is about as difficult!! :))


The code in the first answer didn't quite work for me. Here's what I used.

require 'gdata/client'
require 'gdata/http'
require 'gdata/auth'

KEY = 'YOUR_DOCUMENT_KEY'
URL = "https://docs.google.com/feeds/download/spreadsheets"

client = GData::Client::Spreadsheets.new
client.clientlogin('REPLACE_WITH_LOGIN', 'REPLACE_WITH_PASSWORD')

#Change the csw at the end to match your required format
test = client.get("#{URL}/Export?key=#{KEY}&fmcmd&exportFormat=csv")

puts test.body
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号