开发者

Self-describing file format for gigapixel images?

开发者 https://www.devze.com 2022-12-14 13:26 出处:网络
In medical imaging, there appears to be two ways of storing huge gigapixel images: Use lots of JPEG images (either packed into files or individually) and cook up some bizarre index format to describ

In medical imaging, there appears to be two ways of storing huge gigapixel images:

  1. Use lots of JPEG images (either packed into files or individually) and cook up some bizarre index format to describe what goes where. Tack on some metadata in some other format.

  2. Use TIFF's tile and multi-image support to cleanly store the images as a single file, and provide downsampled versions for zooming speed. Then abuse various TIFF tags to store metadata in non-standard ways. Also, store tiles with overlapping boundaries that must be individually translated later.

In both cases, the reader must understand the format well enough to understand how to draw things and read the metadata.

Is there a better way to store these images? Is TIFF (or BigTIFF) still the right format for this? Does XMP solve the problem of m开发者_Python百科etadata?

The main issues are:

  • Storing images in a way that allows for rapid random access (tiling)
  • Storing downsampled images for rapid zooming (pyramid)
  • Handling cases where tiles are overlapping or sparse (scanners often work by moving a camera over a slide in 2D and capturing only where there is something to image)
  • Storing important metadata, including associated images like a slide's label and thumbnail
  • Support for lossy storage

What kind of (hopefully non-proprietary) formats do people use to store large aerial photographs or maps? These images have similar properties.


It seems like starting with TIFF or BigTIFF and defining a useful subset of tags + XMP metadata might be the way to go. FITS is no good since it is basically for lossless data and doesn't have a very appropriate metadata mechanism.

The problem with TIFF is that it just allows too much flexibility, but a subset of TIFF should be acceptable.

The solution may very well be http://ome-xml.org/ and http://ome-xml.org/wiki/OmeTiff.

It looks like DICOM now has support: ftp://medical.nema.org/MEDICAL/Dicom/Final/sup145_ft.pdf


You probably want FITS.

  • Arbitrary size
  • 1--3 dimensional data
  • Extensive header
  • Widely used in astronomy and endorsed by NASA and the IAU


I'm a pathologist (and hobbyist programmer) so virtual slides and digital pathology are a huge interest of mine. You may be interested in the OpenSlide project. They have characterized a number of the proprietary formats from the large vendors (Aperio, BioImagene, etc). Most seem to consist of a pyramidal zoomed (scanned at different microscopic objectives, of course), large tiff files containing multiple tiled tiffs or compressed (JPEG or JPEG2000) images.


The industry standard is DICOM Sup 145; getting vendors to adopt it though has been sluggish, but inventing yet another format would probably not be helpful.


PNG might work for you. It can handle large images, metadata, and the PNG format can have some interlacing, so you can get up to (down to?) an n/8 x n/8 downsampled image pretty easily.

I'm not sure if PNG can do rapid random access. It is chunked, but that might not be enough.

You could represent sparse data with the transparency channel.


JPEG2000 might be worth a look, some interesting efforts from National libraries in this space.

0

精彩评论

暂无评论...
验证码 换一张
取 消