In R, I have a bunch of datetime values that I measure in GMT. I keep running into accidents where some function or another loses the timezone on my values, or even loses the class name. Even on functions so basic as c()
and开发者_StackOverflow unlist()
:
> dput(x)
structure(1317830532, class = c("POSIXct", "POSIXt"), tzone = "GMT")
> dput(c(x))
structure(1317830532, class = c("POSIXct", "POSIXt"))
> dput(list(x))
list(structure(1317830532, class = c("POSIXct", "POSIXt"), tzone = "GMT"))
> dput(unlist(list(x)))
1317830532
I feel like I'm a hair's breadth away from having a real Mars Climate Orbiter moment if this happens when I least expect it. Anyone have any strategies for making sure their dates "stay put"?
This behaviour is documented in ?c
, ?DateTimeClasses
and ?unlist
:
From ?DateTimeClasses
:
Using
c
on "POSIXlt
" objects converts them to the current time zone, and on "POSIXct
" objects drops any "tzone
" attributes (even if they are all marked with the same time zone).*
From ?c
:
c
is sometimes used for its side effect of removing attributes except names.*
That said, my testing indicates that the integrity of your data remains intact, despite using c
or unlist
. For example:
x <- structure(1317830532, class = c("POSIXct", "POSIXt"),
tzone = "GMT")
y <- structure(1317830532+3600, class = c("POSIXct", "POSIXt"),
tzone = "PST8PDT")
x
[1] "2011-10-05 16:02:12 GMT"
y
[1] "2011-10-05 10:02:12 PDT"
strftime(c(x, y), format="%Y/%m/%d %H:%M:%S", tz="GMT")
[1] "2011/10/05 16:02:12" "2011/10/05 17:02:12"
strftime(c(x, y), format="%Y/%m/%d %H:%M:%S", tz="PST8PDT")
[1] "2011/10/05 09:02:12" "2011/10/05 10:02:12"
strftime(unlist(y), format="%Y/%m/%d %H:%M:%S", tz="PST8PDT")
[1] "2011/10/05 10:02:12"
Your Mars Rover should be OK if you use R to keep track of dates.
Why not set your timezone to GMT for your R sessions, then? If something gets converted to the "current" timezone, it is still right.
Given that this is documented behavior and one should either avoid such functions or else defensively code around such behavior, then you need mechanisms to support either approach. For things like this, I would recommend writing a "poor man's lint"; with such a lint detector, you can go about restoring sanity In addition, to lint detection, there are several approaches to avoiding Mars Polar Orbiter crashes, some are independent of each other, others dependent:
- Set a policy & build alternatives First, for all of the functions that you know are causing you problems, either decide that you won't use them, or write a new wrapper function that will behave as intended, and that will set the timezone parameter you desire. Then, ensure that you use that special wrapper rather than the underlying function.
- Static analysis Write a search function using your favorite editor (e.g. as a macro), using a shell script & the GNU
find
andgrep
functions, or in some other manner (e.g.grep
in R), to find those particular functions that are causing you problems. When found, either remove or use a defensive coding method (e.g. the wrapper in #1). - Testing Using unit tests, e.g.
Runit
ortestthat
, develop tests that ensure that timezone properties are maintained when using your functions or package. Every time there's a new bug, create a new test to ensure that bug doesn't appear again in released versions. - Weak type checking You can also include tests throughout your code that test whether a timezone is specified. It's best to have your own function for this test, rather than write a block of code that is reproduced throughout. In this way, you can eventually extend the checking to include other types of checks, such as persistence of the timezone and tests for whether operations on two or more objects are mindful of differences in timezones (maybe they allow it, maybe they don't).
- Map everything to one TZ Also known as Indiana-be-damned. Retaining a variety of policies about the timezones is hard work, and is essentially friction in working with temporal data. Just map to one TZ (UTC) and then let anything local work from that. If you happen to have local regularity that is invariant of DST, then address that after converting back from UTC.
I do all of #s 1-4 for other issues, but, just as they're easily adapted to timezone checking, they're fairly reusable for lots of Mars Orbiter-avoiding objectives. I do this kind of thing precisely to avoid coding the next such Mars Orbiter. (That was an expensive lesson for all of us that work with numerical data. :))
精彩评论