I have XML files stored in BLOB storage, and I am trying to figure out what is the most efficient way to update them ( and/or add some elements to them). In a WebRole, I came up with this :
using (MemoryStream ms = new MemoryStream())
{
var blob = container.GetBlobReference("file.xml");
blob.DownloadToStream(msOriginal);
XDocument xDoc= XDocument.Load(ms);
// Do some updates/inserts using LINQ to XML.
blob.Delete();//Details about this later on.
using(MemoryStream msNew = new MemoryStream())
{
xDoc.Save(msNew);
msNew.Seek(0,SeekOrigin.Begin);
blob.UploadFromStream(msNew);
}
}
I am looking at these parameters considering the efficiency:
- BLOB Transactions.
- Bandwidth. (Not sure if it's counted, because the code runs in the data-center)
- Memory consumption on the instance.
Some things to mention:
My xml files are around 150-200 KB.
I am aware of the fact that XDocument loads the whole file into memory, and working in streams ( XmlWriter and XmlReader ) could solve this. But I Assume this will require working wit开发者_如何学Ch BlobStream which could lead to less efficient transaction-wise (I think).
About blob.Delete(), without it, the uploaded xml in the blob storage seems to be missing some closing tags at the end of it. I assumed this is caused by a collision with the old data. I could be completely wrong here, but using the delete solved it ( costing one more transaction though ).
Is the code I provided is a good practice or maybe a more efficient way exists considering the parameters I mentioned ?
I believe the problem with the stream based method is that the storage client doesn't know how long the stream is before it starts to send the data. This is probably causing the content-length to not be updated, giving the appearance of missing data at the end of the file.
Working with the content of the blob in text format will help. You can download the blob contents as text and then upload as text. Doing this, you should be able to both avoid the delete (saving you 1/3rd the transactions) and have simpler code.
var blob = container.GetBlobReference("file.xml");
var xml = blob.DownloadText(); // transaction 1
var xDoc= XDocument.Parse(xml);
// Do some updates/inserts using LINQ to XML.
blob.UploadText(xDoc.ToString()); // transaction 2
Additionally, if you can recreate the file without downloading it in the first place (we can do this sometimes), then you can just upload it and overwrite the old one using one storage transaction.
var blob = container.GetBlobReference("file.xml");
var xDoc= new XDocument(/* generate file */);
blob.UploadText(xDoc.ToString()); // transaction 1
I am aware of the fact that XDocument loads the whole file into memory, and working in streams ( XmlWriter and XmlReader ) could solve this.
Not sure it would solve too much. Think about it. How do you add koolaid to the water while it is flying through the hose. That is what a stream is. Better to wait until it is in a container.
Outside of that, what is the reason for the focus on efficiency (a technical problem) rather than editing (the business problem)? Are the documents changed often enough to warrant a serious look at performance? Or are you just falling prey to the normal developer tendency to do more than what is necessary? (NOTE: I am often guilty in this area too)
Without a concept of a Flush(), the Delete is an acceptable option, at first glance. I am not sure if moving to the asynch methods might facilitate the same end with less overhead.
精彩评论