I'd like to get smarter excerpts of sections of text. As I'll be using Movable Type's regex_replace function, I'm gonna be trying to grab everything after the first few sentences.
While \..*
gets everything after the first period, that often leaves a too-short excerpt. How might I do the s开发者_Go百科ame thing (everything after the first period) but skipping the first 100 characters?
Alternatively, how would I just grab everything after, say, the second or third period?
Not familiar with regex_replace
, I'll use the PHP preg_replace
function and you can adapt accordingly:
$truncated = preg_replace('/^(.{100}.*?\.).*$/s', '$1', $long);
Edit: I don't know what's up with the syntax highlighting on output treating the entire thing as a string, it looks fine in the preview.
And another version, which will try to be smart about not breaking up numbers with a decimal point (or other places a period might occur somewhere other than the end of a sentence):
$truncated = preg_replace('/^(.{100}.*?\.(?![a-z0-9])).*$/s', '$1', $long);
Explanation:
- The part you want to keep is grouped with parentheses.
- You'll keep at least 100 characters:
.{100}
- You'll then keep any following characters up to the first decimal point:
.*?\.
- In the second version, I used a negative lookahead—
(?![a-z0-9])
—which will cause the last part to continue on to the next decimal place if the period character is followed by either a number or letter. - Dot matches new-line (the
s
modifier at the end of the pattern). If Movable Type'sregex_replace
function takes a pattern without delimiters (the leading slash and the trailing/s
in my pattern), you can use(?s)
at the beginning of the pattern instead. - Use
$1
in the replacement to keep the first captured group.
Complete sentence is vague, since different languages have different ways of encoding end-of-sentence. Let's assume that a space after a period is EOS:
/^.*?\.\s+(?:.{N})(.*)/
Replace N by desired number.
精彩评论