I'm p开发者_运维技巧laying around with drawing bubble charts in R -- the current project is to graph a bubble chart of political donations that has the following characteristics:
x-axis: size of donation, in ranges i.e. $10-$19, $20-29, $30-49, etc.
y-axis: number of donations of that amount
area of bubble: total amount of donations
I'm not planning anything complex, just something like:
symbols(amount_ranges,amount_occurrences, circles=sums)
The data is pretty granular, so there is a separate entry for each donation and they need to summed in order to get the values I'm looking for.
For example, the data looks like this (extraneous columns removed):
CTRIB_NAML CTRIB_NAMF CTRIB_AMT FILER_ID
John Smith $49 123456789
This is not that complex, but is there a simple way in R to count up the number of occurrences of a certain value (for the y-axis)? And to add up sum of those donations (which is derivative of the axes)? Or do I need to create a function that iterates through the data and compiles these numbers separately? Or pre-process the data in someway?
This is easy when you use the ggplot2
package with geom_point
.
One of many benefits of using ggplot
is that the built-in statistics means you don't have to pre-summarise your data. geom_point
in combination with stat_sum
is all you need.
Here is the example from ?geom_point
. (Note that mtcars
is a built-in dataset with ggplot2
.)
See the ggplot website and geom_point for more detail.
library(ggplot2)
ggplot(mtcars, aes(wt, mpg)) + geom_point(aes(size = qsec))
You can use ddply
from package plyr
here. If your original data.frame was called dfr
, then something close to this should work:
result<-ddply(dfr, .(CTRIB_AMT), function(partialdfr){data.frame(amt=partialdfr$CTRIB_AMT[1], sm=sum(partialdfr$CTRIB_AMT), mn=mean(partialdfr$CTRIB_AMT)) })
In fact, a base R solution is also rather simple:
vals<-sort(unique(dfr$CTRIB_AMT))
sums<-tapply( dfr$CTRIB_AMT, dfr$CTRIB_AMT, sum)
counts<-tapply( dfr$CTRIB_AMT, dfr$CTRIB_AMT, length)
I'm sure more elegant solutions exist.
精彩评论