I'm trying to get my head around how to do efficient bulk inserts of relational data into RavenDB, particularly where converting from relational data to aggregates.
Let's say we have two dump files of two tables: Orders
and OrderItems
. They're too big to load into memory, so I read them as streams. I can read through each table and create a document in RavenDB corresponding to each row. I can do this as bulk operations using batched requests. Easy and efficient so far.
Then I want开发者_运维知识库 to transform this on the server, getting rid of the OrderItems
and integrating them in to their parent Order
documents. How can I do this without thousands of roundtrips?
The answer seems to lie somewhere between set-based updates, live projections and denormalized updates, but I don't know where.
You're going to need to do this with denormalised updates and set-based updates. Take a look at the PATCH API to see what it offers. Although you only need the set-based updates if you plan on updating several docs at once, you can just patch against a know doc directly using the PATCH api.
Live projections will only help you when you are getting the results of a query/index, they don't change the docs themselves, only what is returned from the server to the client.
However I'd recommend that if possible you combine a Order and the corresponding OrderItems in-memory before you send them to RavenDB. You could still stream the data from the dump files, just use some caching if needed. This will be the simplest option.
Updated
I've made some sample code that shows how to do this. This patches the Comments
array/list within a particular Post
doc, in this case "Posts/1"
精彩评论