I'm dealing with blobs of up to - I estimate - about 100 kilo bytes in size. The data is compressed already.
Storage engine: InnoDB on MySQL 5.1
Frontend: PHP (Symfony with Propel ORM)
Some questions:
I've read somewhere that it's not good to update blobs, because it leads to reallocation, fragmentation, and thus bad performance. Is that true? Any reference on this?
Initially the blobs get constructed by appending data chunks. Each chunk is up to 16 kilo bytes in size. Is it more efficient to use a separate chunk table instead, for example with fields as below?
parent_id, position, chunk
Then, to get the entire blob, one would do something like:
SELECT GROUP_CONCAT(chunk ORDER 开发者_C百科BY position) FROM chunks WHERE parent_id = 187
The result would be used in a PHP script.
Is there any difference between the types of blobs, aside from the size needed for meta data, which should be negligible.
If you're creating and deleting data in a table, you will get fragmentation of table's data structure.
I don't think you can gain anything by splitting blobs into chunks — you don't gain anything by fragmenting data before DB fragments it :)
You can defragment table's structure by rebuilding it (OPTIMIZE TABLE
in MySQL).
I could not find information how MySQL stores blobs on disk. If it stores them alongside other row data, then you could use clustered index (PK in InnoDB, ALTER TABLE ORDER BY
in MyISAM) to require particular order of data in table's datafile (e.g. ordered by popularity to create "hot" area which might improve caching and reduce seeking a bit).
In addition to fragmentation of database's own structure, there's problem of fragmentation of table's file in filesystem.
Even if you only inserted data to the table with zero fragmentation of the table itself, the filesystem that holds the table file will sooner or later fragment it on disk. It's unavoidable on safe filesystems as they never update file's data in-place.
if fragmentation is a problem, then I'd attack it at lowest level possible. Don't store blobs in the database, store only some references to files on disk.
Filesystems are closer to the physical disk, so they can deal with fragmentation much better than DB query that's few levels of abstraction above it. Some filesystems automatically defragment small files, but leave large files fragmented.
Or you might just throw hardware at the problem — use RAID, add a ton of RAM for disk/DB caches or use SSD.
And of course you've benchmarked it carefully and know that fragmentation is a problem in a first place, right?
精彩评论