开发者

Performance of MySql Xml functions?

开发者 https://www.devze.com 2023-02-04 07:44 出处:网络
I am pretty excited about the new Mysql XMl Functions. Now I can finally embed something like \"object oriented\" documents in my oldschool relational database.

I am pretty excited about the new Mysql XMl Functions.

Now I can finally embed something like "object oriented" documents in my oldschool relational database.

For an example use-case consider a user who sings up at your website using facebook connect. You can fetch an object for the user using the graph api, and get nice information. This information however can vary vastly. Some fields may or may not be set, some may be added over time and so on.

Well if you are just intersted in very special fields (for example friends relations, gender, movies...), you can project them into your relational database scheme.

However using the XMl functions you could store the whole object inside a field and then your different models can access the data using the ExtractValue function. You can store everything right away without needing to worry what you will need later.

But what will the performance be?

For example I have a table with 50 000 entries which represent useres. I have an enum field that states "male", "female" (or various other genders to be politically correct).

The performance of for example fetching all males will be very fast.

  • But what about something like WHERE ExtractValue(userdata, '/gender/') = 'male' ?

  • How will the performance vary if the object gets bigger?

  • Can I maby somehow put an Index on specified xpath selections?

  • How do field types work together with this functions/performance. Varchar/blob?

  • Do I need fulltext indexes?

To sum up my question:

Mysql XML functins look great. And I am sure they are really great if you just want to store structured data that you fetch and analyze further in your application.

But how will they stand battle in procedures where there are internal scans/sorting/comparision/calculations performed on them?

Can Mysql replace document oriented databases like CouchDB/Sesame?

What are the gains and trade offs of XML functions?

How开发者_开发问答 and why are they better/worse than a dynamic application that stores various data as attributes?

For example a key/value table with an xpath as key and the value as value connected to the document entity.

Anyone made any other experiences with it or has noticed something mentionable?


I tend to make comments similar to Pekka's, but I think the reason we cannot laugh this off is your statement "This information however can vary vastly." That means it is not realistic to plan to parse it all and project it into the database.

I cannot answer all of your questions, but I can answer some of them.

Most notably I cannot tell you about performance on MySQL. I have seen it in SQL Server, tested it, and found that SQL Server performs in memory XML extractions very slowly, to me it seemed as if it were reading from disk, but that is a bit of an exaggeration. Others may dispute this, but that is what I found.

"Can Mysql replace document oriented databases like CouchDB/Sesame?" This question is a bit over-broad but in your case using MySQL lets you keep ACID compliance for these XML chunks, assuming you are using InnoDB, which cannot be said automatically for some of those document oriented databases.

"How and why are they better/worse than a dynamic application that stores various data as attributes?" I think this is really a matter of style. You are given XML chunks that are (presumably) documented and MySQL can navigate them. If you just keep them as-such you save a step. What would be gained by converting them to something else?

The MySQL docs suggest that the XML file will go into a clob field. Performance may suffer on larger docs. Perhaps then you will identify sub-documents that you want to regularly break out and put into a child table.

Along these same lines, if there are particular sub-docs you know you will want to know about, you can make a child table, "HasDocs", do a little pre-processing, and populate it with names of sub-docs with their counts. This would make for faster statistical analysis and also make it faster to find docs that have certain sub-docs.

Wish I could say more, hope this helps.

0

精彩评论

暂无评论...
验证码 换一张
取 消