I'm trying to pull the first paragraph out of Markdown formatted documents:
This is the first paragraph.
This is the second paragraph.
The answer here gives me a solution that matches the first string ending in a double line break.
Perfect, except some of the texts begin with Markdown-style headers:
###
This is an h3 header.This is the first paragraph.
So I need to:
- Skip any line that begins with one or more
#
symbols. - Match the first string ending in a double line break.
In other words, return 'This is the first paragraph' in both of the examples above.
So far, I've tried many variations on:
"/(?s)(?:(?!\#))((?!(\r?\n){2}).)*+/
But I can't get it to return the proper m开发者_如何学运维atch.
Where did I go wrong in my lookaround?
I'm doing this in PHP (preg_match()), if that makes a difference.
Thanks!
You could try
"/(?sm)^[^#](?:(?!(?:\r\n|\r|\n){2}).)*/"
I enable the multiline option by using (?sm)
instead of (?s)
and start each check at a new line, which may not be starting with a #. And I used \r\n|\r|\n
instead of \r?\n
because my testing environment had funny line breaks =)
精彩评论