Let's say that I want to generate a large data frame from scratch.
Using the data.frame function is how I would generally create data frames. However, df's like the following are extremely error prone and inefficient.
So is there a more efficient way of creating the following data frame.
df <- data.frame(GOOGLE_CAMPAIGN=c(rep("Google - Medicare - US", 928), rep("MedicareBranded", 2983),
rep("Medigap", 805), rep("Medigap Branded", 1914),
rep("Medicare Typos", 1353), rep("Medigap Typos", 635),
rep("Phone - MedicareGeneral", 585开发者_如何学JAVA),
rep("Phone - MedicareBranded", 2967),
rep("Phone-Medigap", 812),
rep("Auto Broad Match", 27),
rep("Auto Exact Match", 80),
rep("Auto Exact Match", 875)),
GOOGLE_AD_GROUP=c(rep("Medicare", 928), rep("MedicareBranded", 2983),
rep("Medigap", 805), rep("Medigap Branded", 1914),
rep("Medicare Typos", 1353), rep("Medigap Typos", 635),
rep("Phone ads 1-Medicare Terms",585),
rep("Ad Group #1", 2967), rep("Medigap-phone", 812),
rep("Auto Insurance", 27),
rep("Auto General", 80),
rep("Auto Brand", 875)))
Yikes, that is some 'bad' code. How can I generate this 'large' data frame in a more efficient manner?
If your only source for that information is a piece of paper, then you probably won't get much better than that, but you can at least consolidate all that into a single rep
call for each column:
#I'm going to cheat and not type out all those strings by hand
x <- unique(df[,1])
y <- unique(df[,2])
#Vectors of the number of times for each
x1 <- c(928,2983,805,1914,1353,635,585,2967,812,27,955)
y1 <- c(x1[-11],80,875)
dd <- data.frame(GOOGLE_CAMPAIGN = rep(x, times = x1),
GOOGLE_AD_GROUP = rep(y, times = y1))
which should be the same:
> all.equal(dd,df)
[1] TRUE
But if this information is already in a data structure in R somehow and you just need to transform it, that could possibly be even easier, but we'd need to know what that structure is.
Manually, (1) create this data frame:
> dfu <- unique(df)
> rownames(dfu) <- NULL
> dfu
GOOGLE_CAMPAIGN GOOGLE_AD_GROUP
1 Google - Medicare - US Medicare
2 MedicareBranded MedicareBranded
3 Medigap Medigap
4 Medigap Branded Medigap Branded
5 Medicare Typos Medicare Typos
6 Medigap Typos Medigap Typos
7 Phone - MedicareGeneral Phone ads 1-Medicare Terms
8 Phone - MedicareBranded Ad Group #1
9 Phone-Medigap Medigap-phone
10 Auto Broad Match Auto Insurance
11 Auto Exact Match Auto General
12 Auto Exact Match Auto Brand
and (2) this vector of lengths:
> lens <- rle(as.numeric(interaction(df[[1]], df[[2]])))$lengths
> lens
[1] 928 2983 805 1914 1353 635 585 2967 812 27 80 875
From these two inputs (dfu
and lens
) we can reconstruct df
(here called df2
):
> df2 <- dfu[rep(seq_along(lens), lens), ]
> rownames(df2) <- NULL
> identical(df, df2)
[1] TRUE
精彩评论