开发者

Regex for everything *after* the first complete sentence (period and space) *after* N characters

开发者 https://www.devze.com 2023-01-30 05:07 出处:网络
I\'d like to get smarter excerpts of sections of text. As I\'ll be using Movable Type\'s regex_replace function, I\'m gonna be trying to grab everything after the first few sentences.

I'd like to get smarter excerpts of sections of text. As I'll be using Movable Type's regex_replace function, I'm gonna be trying to grab everything after the first few sentences.

While \..* gets everything after the first period, that often leaves a too-short excerpt. How might I do the s开发者_Go百科ame thing (everything after the first period) but skipping the first 100 characters?

Alternatively, how would I just grab everything after, say, the second or third period?


Not familiar with regex_replace, I'll use the PHP preg_replace function and you can adapt accordingly:

$truncated = preg_replace('/^(.{100}.*?\.).*$/s', '$1', $long);

Edit: I don't know what's up with the syntax highlighting on output treating the entire thing as a string, it looks fine in the preview.

And another version, which will try to be smart about not breaking up numbers with a decimal point (or other places a period might occur somewhere other than the end of a sentence):

$truncated = preg_replace('/^(.{100}.*?\.(?![a-z0-9])).*$/s', '$1', $long);

Explanation:

  1. The part you want to keep is grouped with parentheses.
  2. You'll keep at least 100 characters: .{100}
  3. You'll then keep any following characters up to the first decimal point: .*?\.
  4. In the second version, I used a negative lookahead—(?![a-z0-9])—which will cause the last part to continue on to the next decimal place if the period character is followed by either a number or letter.
  5. Dot matches new-line (the s modifier at the end of the pattern). If Movable Type's regex_replace function takes a pattern without delimiters (the leading slash and the trailing /s in my pattern), you can use (?s) at the beginning of the pattern instead.
  6. Use $1 in the replacement to keep the first captured group.


Complete sentence is vague, since different languages have different ways of encoding end-of-sentence. Let's assume that a space after a period is EOS: /^.*?\.\s+(?:.{N})(.*)/ Replace N by desired number.

0

精彩评论

暂无评论...
验证码 换一张
取 消