Based on my previous question, I've come up with a better way to approach my problem. Here's what I have in mind:
- I want to start extracting word/space characters up to the first occurrence of a pipe ('|') delimiter, or else a newline. Trim whitespace at either end. This extraction will be the start of a new "entry".
- For each pipe found (if any), I want to remove the pipe, then extract everything up to the next occurrence of a pipe or newline. Trim whitespace. Everything extracted will be a parameter for above "entry".
- For the next occurrence of a newline:
- If the newline begins with a pipe, or the previous line ended with one, I want to remove the newline like it wasn't there.
- Otherwise, I want to begin anew from step 1 with a new "entry".
Here's some sample input:
This will be a new entry | param1 |param2 |etc.
This is another entry, but without params
This is a third entry|with a twist
| I'm using subsequent lines for
| its parameters.
Yet I still want the next line to be another new entry.
And this should be the output:
Entry #1: "This will be a new entry"
Params: ["param1","param2","etc."]
Entry #2: "This is another entry, but without params"
Entry #3: "This is a third entry"
Params: ["with a twist","I'm using subsequent lines for","its parameters."]
Entry #4: "Yet开发者_运维技巧 I still want the next line to be another new entry."
What would be a good way to go about doing this?
At this point, you should consider writing a proper grammar and using a parser generator instead of hacking up regexes to do the job.
Even if you are going for regexes, trying to come up with some miraculous one-liner that does the whole job is going to result in something hideous.
Instead, consider something like the following psuedocode:
foreach (line of input)
If the first non-whitespace character is NOT a delimiter
output what we have so far, then parse out the title of the next entry
while there's still text on this line
grab up to the next delimiter, parse as a parameter.
精彩评论