开发者

Inconsistent results in R with RNetCDF - why?

开发者 https://www.devze.com 2023-03-14 08:13 出处:网络
I am having trouble extracting data from NetCDF data files using RNetCDF. The data files each have 3 dimensions (longitude, latitude, and a date) and 3 variables (latitude, longitude, and a climate va

I am having trouble extracting data from NetCDF data files using RNetCDF. The data files each have 3 dimensions (longitude, latitude, and a date) and 3 variables (latitude, longitude, and a climate variable). There are four datasets, each with a different climate variable.

Here is some of the output from print.nc(p8m.tmax) for clarity. The other datasets are identical except for the specific climate variable.

dimensions:
  month = UNLIMITED ; // (1368 currently)
  lat = 3105 ;
  lon = 7025 ;
variables:
  float lat(lat) ;
          lat:long_name = "latitude" ;
          lat:standard_name = "latitude" ;
          lat:units = "degrees_north" ;
  float lon(lon) ;
          lon:long_name = "longitude" ;
          lon:standard_name = "longitude" ;
          lon:units = "degrees_east" ;
  short tmax(lon, lat, month) ;
          tmax:missing_value = -9999 ;
          tmax:_FillValue = -9999 ;
          tmax:units = "degree_celsius" ;
          tmax:scale_factor = 0.01 ;
          tmax:valid_min = -5000 ;
          tmax:valid_max = 6000 ;

I am getting behavior I don't understand when I use the var.get.nc function from the RNetCDF package.

For example, when I attempt to extract 82 values beginning at stval from the maximum temperature data (p8m.tmax <- open.nc(tmaxdataset.nc)) with

 > var.get.nc(p8m.tmax,'tmax', start=c(lon_val, lat_val, stval),count=c(1,1,82))

(where lon_val and lat_val specify the location in the dataset of the coordinates I'm interested in and stval is stval is set to which(time_vec==200201), which in t开发者_如何学Pythonhis case equaled 1285.) I get Error: Invalid argument

But after successfully extracting 80 and 81 values

> var.get.nc(p8m.tmax,'tmax', start=c(lon_val, lat_val, stval),count=c(1,1,80))
> var.get.nc(p8m.tmax,'tmax', start=c(lon_val, lat_val, stval),count=c(1,1,81))

the command with 82 works:

> var.get.nc(p8m.tmax,'tmax', start=c(lon_val, lat_val, stval),count=c(1,1,82))

[1]  444  866 1063 ... [output snipped]

The same problem occurs in the identically structured tmin file, but at 36 instead of 82:

> var.get.nc(p8m.tmin,'tmin', start=c(lon_val, lat_val, stval),count=c(1,1,36))

produces Error: Invalid argument

But after repeating with counts of 30, 31, etc

> var.get.nc(p8m.tmin,'tmin', start=c(lon_val, lat_val, stval), count=c(1,1,36)) 

works.

These examples make it seem like the function is failing at the last count, but that actually isn't the case. In the first example, var.get.nc gave Error: Invalid argument after I asked for 84 values. I then narrowed the failure down to the 82nd count by varying the starting point in the dataset and asking for only 1 value at a time. The particular number the problem occurs at also varies. I can close and reopen the dataset and have the problem occur at a different location.

In the particular examples above, lon_val and lat_val are 1595 and 1751, respectively, identifying the location in the dataset along the lat and lon dimensions for the latitude and longitude I'm interested in. The 1595th latitude and 1751th longitude are not the problem, however. The problem occurs with all other latitude and longitudes I've tried.

Varying the starting location in the dataset along the climate variable dimension (stval) and/or specifying it different (as a number in the command instead of the object stval) also does not fix the problem.

This problem doesn't always occur. I can run identical code three times in a row (clearing all objects in between runs) and get a different outcome each time. The first run may choke on the 7th entry I'm trying to get, the second might work fine, and the third run might choke on the 83rd entry. I'm absolutely baffled by such inconsistent behavior.

The open.nc function has also started to fail with the same Error: Invalid argument. Like the var.get.nc problems, it also occurs inconsistently.

Does anyone know what causes the initial failure to extract the variable? And how I might prevent it? Could have to do with the size of the data files (~60GB each) and/or the fact that I'm accessing them through networked drives?

This was also asked here: https://stat.ethz.ch/pipermail/r-help/2011-June/281233.html

> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] reshape_0.8.4   plyr_1.5.2      RNetCDF_1.5.2-2

loaded via a namespace (and not attached):
[1] tools_2.13.0


To solve this problem, I switched from the RNetCDF package (version 1.5.2-2) to the ncdf package (1.6.5). The functions in the two packages are similarly named and have the same purposes [open.nc vs. open.ncdf, var.get.nc vs. get.var.ncdf]. Using the exact same code with the RNetCDF function names replaced with ncdf functions, I get no errors and the expected results.

So while the following RNetCDF commands fail (only sometimes & for no apparent reason)

>p8m.tmax <- open.nc('tmax.nc')
>var.get.nc(p8m.tmax,'tmax', start=c(lon_val, lat_val, stval),count=c(1,1,82))

These ncdf commands never fail

>p8m.tmax <- open.ncdf('tmax.nc')
>get.var.ncdf(p8m.tmax,'tmax', start=c(lon_val, lat_val, stval),count=c(1,1,82))

This is not a real solution - I still don't know why the functions in the RNetCDF package sometimes work and sometimes do not. However, it does allow me to extract the data I need and will hopefully be of some use to others working with netcdf data in R.

0

精彩评论

暂无评论...
验证码 换一张
取 消