In my Stata do
scripts, I often have to compare dates which may be missing. Unfortunately, 开发者_开发技巧the internal representation of .
is the largest possible number of the given range, so the following holds:
5 < .
This can become quite annoying e.g. when checking whether a date is within a certain range:
gen between_start_stop = . if d == .
replace between_start_stop = 1 if ///
!missing(d) & !missing(start) & !missing(stop) & ///
start < d & d < stop
replace between_start_stop = 0 if ///
((!missing(d) & !missing(start) & !(start < d)) | ///
(!missing(d) & !missing(stop) & !(d < stop))
instead of the following:
gen between_start_stop = (start < d) & (d < stop)
Is there a way to use comparison operators that work with ternary logic?
I.e., I would like the following statements to be true:
(5 < .) == .
(. < .) == .
(. < 5) == .
(. & 1) == .
(. & 0) == 0
etc...
A couple of suggestions:
- use
inrange()
(also look at inlist) to specify ranges instead of a series of<
and>
statements; - you can specify multiple items in
missing()
or!missing()
statements like!missing(start, stop, d)
and it really sounds like you want to use
cond()
, which (using an ex from the help file) can be used to specify multiple conditions in one function:g var = 1 if cond(missing(x), ., cond(x>2,50,70))
returns .
if x
is missing, returns 50
if x > 2
, and returns 70
if x < 2
The analogy does not get you what you want -- This formulation returns ‘missing’ when a known d is below a known start (even if stop is, here irrelevantly, missing) or a known d is above a known stop (even if start is, here irrelevantly, missing). The correct value in both cases is ‘false’. I have a utility ('validly') which allows 'generate' to access three-valued logic and does what you want -- see discussion on my webpage http://www.nuffield.ox.ac.uk/People/sites/KIM/SitePages/Biography.aspx which has a link to a paper expanding further (but be warned -- that has just been rejected by the Stata Journal as being "far too difficult to understand"
精彩评论