开发者

Are there (have there been) any efforts to create a schema language for arbitrary binary formats?

开发者 https://www.devze.com 2023-02-04 14:23 出处:网络
XML has a lot of benefits. It\'s both machine and human readable, it has a standardized format and it开发者_如何学运维 is remarkably versatile.

XML has a lot of benefits. It's both machine and human readable, it has a standardized format and it开发者_如何学运维 is remarkably versatile.

It also has some disadvantages. It's verbose and not a very efficient means of transferring large amounts of data.

One of the most useful aspects of XML is the schema language. Using a schema you can generate source code in any modern programming language to read an xml format without the tedious process of hand coding that usually accompanies most other file formats.

This got me thinking about whether a schema language for arbitrary binary file formats exists and if not, would it be a worth while endeavor?

Just in case I've been unclear. I'm asking about a language whose purpose is to define byte offsets, field and record lengths, delimiters, etc. that could be parsed to generate code that would read a file format that conformed to that specification.

I doubt I'm the first to suggest such an idea so if you know of any projects or working groups that have or are currently pursuing this area I'd be grateful.


I know this is an old question, but in the last few years I feel that Kaitai Struct has emerged as one of the best arbitrary binary schema description options, the bonus that it generates parsing code is a huge bonus.

https://kaitai.io/

"develop parsers for binary structures"


Yes, several people have tried to do this.

One such attempt is Binary Format Description. Another is Data Format Description Language. I'm not sure how practical either one really is, though.


xtype is a new general-purpose binary data language I developed that also covers the typical usage of XML: https://github.com/bitagoras/xtype/ A similar format that should be mentioned here is UBJSON, an efficient binary format for JSON like structures: https://github.com/ubjson/universal-binary-json


"schema" and "arbitrary" are contradictory. Specifying byte offsets, field and record lengths, delimiters, etc. is not "arbitrary".

Byte offsets have been around since COBOL. EDI is a well known, tried and true protocol that does exactly this.

WebMethods, an EDI tool, has a very nice EDI parser built into it.


In short, no. Unless you count programming languages as "schema languages". XML is very structured regardless of the schema. Binary formats can be absolutely anything. Consider the old MS office formats where it was essentially a memory dump of the raw data structures used at runtime. If you allow programming languages, then you can - and do - create a parser in that :-) What about compressed binary files: zip, jpeg, WebM? How and why would a schema language want to encompass those type of things?


Project Epidal.BeeSchema seems to fit your requirements.

https://github.com/Epidal/BeeSchema

0

精彩评论

暂无评论...
验证码 换一张
取 消