开发者

What are the design considerations for using a custom markup/strutured language vs XML

开发者 https://www.devze.com 2023-03-12 20:03 出处:网络
I want a textual interface for some structured data that I want to put into a mySQL table.Currently it is in text from using the notation below.

I want a textual interface for some structured data that I want to put into a mySQL table. Currently it is in text from using the notation below.

I'm trying to understand why XML is used - basically where my fields would be in XML tags instead of using "custom markup/structure" /**/, -, and | to denote tables and fields.

I have code that will put this into mySQL and extract it. I just feel a bit like a hack for using this notation. Later the structured data file will be used for impor开发者_StackOverflow中文版ting and exporting data, kind of like Internet explorer when you export your bookmarks.

/*Table*/
-
Field 1 | Field 2 | Field 3
-
Field 1 | Field 2 | Field 3

What are the design considerations for using a custom markup language vs XML?


You should use XML because :

  1. An XML parser already exists. You don't have to re-invent the wheel.
  2. What happens if one of your fields contains the separator character?
  3. You never know how your application may grow. XML is rich and mature, so you do not have to think about the future of your application. You may have headaches with your own parser.
  4. If you don't want to use XML, consider TrueWill's answer as an alternative. Do some research before beginning to code on your own.


Why invent your own? There are over a dozen lightweight markup languages.

EDIT: @Luc M's answer is very good. In general, you (almost) always want to use an existing parser if one is available. Why reinvent the wheel? If you want a simple format, go with CSV, YAML, or JSON. But there's nothing wrong with XML, and there are lots and lots of solid parsers available for it. Most employers care about getting quality software quickly and cheaply, and writing parsers seldom helps that cause.


What are the considerations?

The favorable things you will get with a do-it-yourself solution:

Parse time: This is only potentially something that you'll get. It's going to be hard for you to beat an optimized parser like RapidXML for reading data. However, your parser will be able to parse directly into your data structures, whereas with an lightweight language-based solution, you must walk the data structure it emits to generate your real data.

Note that it is still possible that a pre-made solution will beat yours, simply because writing an optimized parser is hard. Though there's always Boost.Spirit to help you.

That's really all I can think of for advantages for a do-it-yourself solution. If this were data that you were going to get from the user, there could be advantages in error reporting with a self-made solution. But you're talking about data that you will both generate and consume; there is no expectation of hand editing, so error reporting isn't going to be a significant concern.

The things you get from an XML or other lightweight language solution are pretty much covered by the others.


3 reasons:

(a) the XML spec has been carefully written, there are no ambiguities about what is and isn't allowed. Home-grown specs are never as thorough (I've seen hundreds of them, believe me) so you will forever be arguing about whether a particular message is valid or not.

(b) there's a wide choice of conformant and performant XML parsers around - you will never have to worry about writing and testing your own parser. (Parsers for home-grown languages, in my experience, are usually tested on about 5 test messages before going into production, with inevitable consequences.)

(c) there's a whole ecosystem around XML - authoring tools, validators, programming language APIs, security, canonicalization, you name it; plus the skills and knowledge to make it all work.

Having said that, for very simple data there may be other formats that work equally well, for example Java property files. But I would steer clear of CSV - there are a zillion different flavours and none of them are properly specified.

0

精彩评论

暂无评论...
验证码 换一张
取 消