Assume a person posts this message:
"#books 'War and Peace' by Leo Tolstoy - I love this book."
I want to parse this into th开发者_运维技巧ree variables, like this:
@title = "War and Peace"
@author = "Leo Tolstoy"
@Comment = "I love this book"
I'm sure this is a simple puzzle for a Regex Ninja. Unfortunately, I am but a lowly villager that mops the bloody, sweaty floors upon which real Regex Ninjas train.
BONUS points if you can suggest a regex that does not require so much structure in the message post. Ideally, I want to obtain the same three variables without the structure (or at least with less structure / requirements): "@title" by @author - @comment.
Thanks!
regex = /'(.+)'\s+by\s+(.+)\s+-\s+(.+)/
"#books 'War and Peace' by Leo Tolstoy - I love this book.".scan(regex)
=>
[["War and Peace", "Leo Tolstoy", "I love this book."]]
I don't know ruby syntax but the regex itself for the format you gave would look something like this:
#books\s'([^']+)'\s+by\s+([^-]+)-\s+(.*)
But to answer your question about not making it so dependent on format...ideally you should make it 3 separate fields to fill out. Or if it's general content in a message post and it's looking for a specific format (kinda like bbcode) then I would suggest something more like
[book title='title' author='author']comment[/book]
That would be much easier to parse.
(["'])(?<title>[^"']*)\1\s+by\s+(?<author>[\p{L}\s']+)\s*-\s*(?<comment>.*)$
About 2nd comment: it is impossible implement using only regex, because look at definition of regex - Regular expression and your sentence may be irregular.
An alternate answer:
You could pick a delimiter that you know isn't going to show up very often and just split the string by that. And then enforce the standard/assumption of which order the values will be in (which you are more or less already doing). So for instance, you could have people post
"War and Peace ~ Leo Tolstoy ~ I love this book"
and then just explode/split at the ~
and assume first element to be title, 2nd to be author, 3rd to be comment.
/["'](.*?)["'] by (.*?)\s+-\s+(.*)/
精彩评论