I have a problem in that I need to process a list of numbers, which will开发者_Go百科 be in an English sentence. It could be in the following formats:
items 1, 2 and 3
items 2 through 5
items 1 to 20
items 4 or 8
My initial instinct is to write a simple state machine to parse it, but I was wondering if there is any better (simpler) way, such as maybe some regular expression. Any advice?
If you have C++11, the following parser (AXE) will parse all your formats (I didn't test it):
unsigned i;
auto num = axe::r_unsigned(i);
auto space = axe::r_any(" \t");
auto format1 = num % (*space & ',' & *space) & ~("and" & +space & num);
auto format2 = num & +space & "through" & +space & num;
auto format3 = num & +space & "to" & +space & num;
auto format4 = num & +space & "or" & +space & num;
auto format = "items" & +space & (format1 | format2 | format3 | format4);
If you don't have C++11, you can write a similar parser in C++ using boost::spirit. It's easier and shorter to write and debug such parser than using regular expressions, and you also get a lot of flexibility in creating parsing rules and semantic actions.
If you're wedded to Java, use the Regular Expression functionality.
http://download.oracle.com/javase/tutorial/essential/regex/
But if you're not, a sed script works best for simple text processing.
sed 's/\d{1,} /\1 /g' < file.txt
It seems very simple to write a parser for those strings using a regular expression for each case, or a single expression with an alternative for each. You need to use something like \d+
to match the numbers. I would also group each set of similar combinators (like
"and"/"or" and "to"/"through") into a single alternative to make it easier to process the results.
精彩评论