开发者

read.csv fails to read a CSV file from google docs

开发者 https://www.devze.com 2023-03-01 09:36 出处:网络
I wish to use read.csv to read a google doc spreadsheet. I try using the following code: data_url <- \"http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY

I wish to use read.csv to read a google doc spreadsheet.

I try using the following code:

data_url <- "http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&sing开发者_运维知识库le=true&gid=0&output=csv"
read.csv(data_url)

Which results in the following error:

Error in file(file, "rt") : cannot open the connection

I'm on windows 7. And the code was tried on R 2.12 and 2.13

I remember trying this a few months ago and it worked fine. Any suggestion what might be causing this or how to solve it?

Thanks.


It might have something to do with the fact that Google is reporting a 302 temporarily moved response.

> download.file(data_url, "~/foo.csv", method = "wget")
--2011-04-29 18:01:01--  http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv
Resolving spreadsheets0.google.com... 74.125.230.132, 74.125.230.128, 74.125.230.130, ...
Connecting to spreadsheets0.google.com|74.125.230.132|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv [following]
--2011-04-29 18:01:01--  https://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv
Connecting to spreadsheets0.google.com|74.125.230.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
Saving to: `/home/gavin/foo.csv'

    [ <=>                                                                                                                                                   ] 41          --.-K/s   in 0s      

2011-04-29 18:01:02 (1.29 MB/s) - `/home/gavin/foo.csv' saved [41]

> read.csv("~/foo.csv")
  column1 column2
1       a       1
2       b       2
3      ds       3
4       d       4
5       f       5
6      ga       5

I'm not sure R's internal download code is capable of responding to such redirects:

> download.file(data_url, "~/foo.csv")
trying URL 'http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv'
Error in download.file(data_url, "~/foo.csv") : 
  cannot open URL 'http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv'


I ran into the same problem and eventually found a solution in a forum thread. Using my own public CSV file:

library(RCurl)
tt = getForm("https://spreadsheets.google.com/spreadsheet/pub", 
          hl ="en_US", key = "0Aonsf4v9iDjGdHRaWWRFbXdQN1ZvbGx0LWVCeVd0T1E", 
          output = "csv", 
         .opts = list(followlocation = TRUE, verbose = TRUE, ssl.verifypeer = FALSE)) 

holidays <- read.csv(textConnection(tt))


Check the solution on http://blog.forret.com/2011/07/google-docs-infamous-moved-temporarily-error-fixed/

So what is the solution: just add “&ndplr=1” to your URL and you will skip the authentication redirect. I’m not sure what the NDPLR parameter name stands for, let’s just call it: “Never Do Published Link Redirection“.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号