I have (again) a problem with combining data frames in R. But this time, one is a SpatialPolygonDataFrame (SPDF
) and the other one is usual data.frame (DF
). The SPDF
has around 1000 rows the DF
only 400. Both have a common column, QDGC
Now, I tried
oo <- merge(SPDF,DF, by="QDGC", all=T)
but this only results in a normal data.frame, not a spatial polygon data frame any more. I read somewhere else, that this does not work, but I did not understand what to do in such a case (has to do something with the ID columns, merge uses)
oooh such a hard question, I开发者_开发百科 quess...
Thanks! Jens
Let df = data frame, sp = spatial polygon object and by = name or column number of common column. You can then merge the data frame into the sp object using the following line of code
sp@data = data.frame(sp@data, df[match(sp@data[,by], df[,by]),])
Here is how the code works. The match function inside aligns the columns so that order is preserved. So when we merge it with sp@data, order is correctly preserved. A quick check to see if the code has worked is to inspect the two columns corresponding to the common column and see if they are identical (the common columns get duplicated and it is easy to remove the copy, but i keep it as it is a good check)
It is as easy as this:
require(sp) # the trick is that this package must be loaded!
oo <- merge(SPDF,DF, by="QDGC")
I've tested by myself. But it only works if you use merge from package sp. This is the default when sp
package is loaded. merge
function is then overloaded and sp::merge
is used if the first argument is spatial structure.
merge can produce a dataframe with more rows than the originals if there's not a simple 1-1 mapping of the two dataframes. In which case, it would have to copy all the geometry and create multiple polygons, which is probably not a good thing.
If you have a dataframe which is the same number of rows as a SpatialPointsDataFrame, then you can just directly replace the @data slot.
library(sp)
example(overlay) # to get the srdf object
srdf@data
spplot(srdf)
srdf@data=data.frame(x=runif(3),xx=rep(0,3))
spplot(srdf)
if you get the number of rows wrong:
srdf@data=data.frame(x=runif(2),xx=rep(0,2))
spplot(srdf)
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 3, 2
Maybe the function joinCountryData2Map
in the rworldmap package can give inspiration. (But I may be wrong, as I was last time.)
One more solution is to use append_data
function from the tmaptools
package. It is called with these arguments:
append_data(shp, data, key.shp = NULL, key.data = NULL,
ignore.duplicates = FALSE, ignore.na = FALSE,
fixed.order = is.null(key.data) && is.null(key.shp))
It's a bit unfortunate that it's called append since I'd understand append more ina sense of rbind
and we want to have something like join
or merge
here.
Ignoring that fact, function is really useful in making sure you got your joins correct and if some rows are present only on one side of join. From the docs:
Under coverage (shape items that do not correspond to data records), over coverage (data records that do not correspond to shape items respectively) as well as the existence of duplicated key values are automatically checked and reported via console messages. With
under_coverage
andover_coverage
the under and over coverage key values from the last append_data call can be retrieved,
If it is two shapefiles that are needed to be merged to a single object, just use rbind()
.
When using rbind()
, just make sure that both the arguments you use are SpatialDataFrames
. You can check this using class(sf)
. If it is not a dataframe, then use st_as_sf()
to convert them to a SpatialDataFrame
before you rbind them.
Note : You can also use this to append to NULLs
, especially when you are using a result from a loop and you want to cumulate the results.
精彩评论