Best way to parse a list of numbers_问答_开发者

开发者 https://www.devze.com 2023-04-04 19:32 出处：网络

I have a problem in that I need to process a list of numbers, which will开发者_Go百科 be in an English sentence.It could be in the following formats:

I have a problem in that I need to process a list of numbers, which will开发者_Go百科 be in an English sentence. It could be in the following formats:

items 1, 2 and 3

items 2 through 5

items 1 to 20

items 4 or 8

My initial instinct is to write a simple state machine to parse it, but I was wondering if there is any better (simpler) way, such as maybe some regular expression. Any advice?

If you have C++11, the following parser (AXE) will parse all your formats (I didn't test it):

unsigned i;
auto num = axe::r_unsigned(i);
auto space = axe::r_any(" \t");
auto format1 = num % (*space & ',' & *space) & ~("and" & +space & num);
auto format2 = num & +space & "through" & +space & num;
auto format3 = num & +space & "to" & +space & num;
auto format4 = num & +space & "or" & +space & num;
auto format = "items" & +space & (format1 | format2 | format3 | format4);

If you don't have C++11, you can write a similar parser in C++ using boost::spirit. It's easier and shorter to write and debug such parser than using regular expressions, and you also get a lot of flexibility in creating parsing rules and semantic actions.

If you're wedded to Java, use the Regular Expression functionality.

http://download.oracle.com/javase/tutorial/essential/regex/

But if you're not, a sed script works best for simple text processing.

sed 's/\d{1,} /\1 /g' < file.txt

It seems very simple to write a parser for those strings using a regular expression for each case, or a single expression with an alternative for each. You need to use something like \d+ to match the numbers. I would also group each set of similar combinators (like "and"/"or" and "to"/"through") into a single alternative to make it easier to process the results.