This is a follow up question as hadley pointed out unless I fix the problem with the time stamps the graphs I produce would be incorrect. With this in mind I am working towards fixing the issues I am having with the code. So far I have from my earlier questions that have been answered stopped using the attach() function in favour of using dataSet.df$variableName I am having problems drawing the graph from the strptime time stamps. I will attach all the code I am using and the XML file from which the data set is parsed (This was also answered in an earlier question) from.
<?xml version = "1.0"?>
<Company >
<shareprice>
<timeStamp> 12:00:00.01</timeStamp>
<Price> 25.02</Price>
</shareprice>
<shareprice>
<timeStamp> 12:00:00.02</timeStamp>
<Price> 15</Price>
</shareprice>
<shareprice>
<timeStamp> 12:00:00.025</timeStamp>
<Price> 15.02</Price>
</shareprice>
<shareprice>
<timeStamp> 12:00:00.031</timeStamp>
<Price> 18.25</Price>
</shareprice>
<shareprice>
<timeStamp> 12:00:00.039</timeStamp>
<Price> 18.54</Price>
</shareprice>
<shareprice>
<timeStamp> 12:00:00.050</timeStamp>
<Price> 16.52</Price>
</shareprice>
<shareprice>
<timeStamp> 12:00:01.01</timeStamp>
<Price> 17.50</Price>
</shareprice>
</Company>
The R code I have currently is as follows:
library(ggplot2)
library (XML)
test.df <- xmlToDataFrame("c:/Users/user/Desktop/shares.xml")
test.df
timeStampParsed <- strptime(as.character(test.df$timeStamp), "%H:%M:%OS")
test.df$Price <- as.numeric(as.character(test.df$Price))
summary (test.df)
mean(test.df$Price)
sd (test.df$Price)
mean(timeStampParsed)
par(mfrow=c(1,2))
p开发者_如何学Golot(timeStampParsed, test.df$Price)
qplot(timeStampParsed,Price,data=test.df,geom=c("point","line"),
scale_y_continuous(limits = c(10,26)))
The plot command produces a graph but it is not very pleasant looking. the qplot command returns the following error message:
Error in sprintf(gettext(fmt, domain = domain), ...) :
invalid type of argument[1]: 'symbol'
In the interest in getting this right (and cutting down on the questions being asked) is there a tutorial / website that I can use? Once again thanks very much for your help.
You still make some of the mistakes in the code I corrected in my two previous answers to you. So let's try this again, more explicitly:
library(ggplot2)
library (XML)
df <- xmlToDataFrame("/tmp/anthony.xml") # assign to df, shorter to type
df
sapply(df, class) # shows everything is a factor
summary(df) # summary for factor: counts !
df$timeStamp <- strptime(as.character(test.df$timeStamp), "%H:%M:%OS")
df$Price <- as.numeric(as.character(test.df$Price))
sapply(df, class) # shows both columns converted
options("digits.secs"=3) # make sure we show sub-seconds
summary (df) # real summary
with(df, plot(timeStamp, Price)) # with is an elegant alternative to attach()
I also get an error with qplot()
but you may simply have too little of a range in your data. So let's try this:
R> set.seed(42) # fix random number generator
R> df$timeStamp <- df[1,"timeStamp"] + cumsum(runif(7)*60)
R> summary(df) # new timestamps spanning larger range
timeStamp Price
Min. :2010-07-14 12:00:54.90 Min. :15.0
1st Qu.:2010-07-14 12:01:59.71 1st Qu.:15.8
Median :2010-07-14 12:02:58.12 Median :17.5
Mean :2010-07-14 12:02:55.54 Mean :18.0
3rd Qu.:2010-07-14 12:03:52.20 3rd Qu.:18.4
Max. :2010-07-14 12:04:51.96 Max. :25.0
R> qplot(timeStamp,Price, data=df, geom=c("point","line"),
+ scale_y_continuous(limits = c(10,26)))
R>
Now qplot()
works.
So in sum, you were using data that was not fulfilling some minimum requirements of the qplot
function your were using -- having a time axis spanning more than a second, say.
In general, you may want to start with An Introduction to R (came with the program) or another intro text. You jumped head-first to advanced material (datetime data types, reading from XML, factors, ...) and got burned. First steps first.
精彩评论