开发者

php preg regex - group of whitspaces without linebreak in multiline mode

开发者 https://www.devze.com 2023-02-18 17:55 出处:网络
Hello I am trying to split 开发者_StackOverflowsome input by line, and use trim() on each line. But I would like to do it without using trim, just with regex.

Hello I am trying to split 开发者_StackOverflowsome input by line, and use trim() on each line. But I would like to do it without using trim, just with regex.

The issue I am having with this, is that whitspaces at the end of the line are not trimmed away. I guess my group [^$\s] whitespaces but no linebreak does not work.

So the question is, how to solve my problem, and how to define a group in preg regex, which explicitly says ignore line breaks? At the moment I am thinking my approach is still wrong. The problem is, if I write \s* instead of this weird group. .+ eats all. If I write .+? I do not get strings which include spaces back complete.

preg_match_all("/^\s*+(.+)[^$\s]*+$/m", $_POST['input'], $matches, PREG_SET_ORDER );


Okay, I'm usually all for using regular expressions. But the trim approach would be simpler here. And I assume you avoided it because it usually requires an extra loop. But in this instance you could compact it to:

 $lines = array_map("trim", explode("\n", $_POST["input"]));
 // quite a handy utility function, so just wanted to note that here

But as alternative to your found solution, you could have alternatively used:

preg_split('/((?!\n)\p{Z})*\n((?!\n)\p{Z})*/u', "...\n...");

A bit hackish now. Swapped out the ^$ just for \n, and used assertions to exclude newlines elsewhere. But the \p{Z} is a nice alternative to catch all Unicode space character variations, including NBSP and other ninja placeholders.


preg_match_all("/\s*(.*\S)/", $_POST['input'], $matches, PREG_SET_ORDER );

You need something to eat leading whitespace before your capture group, including whole lines. \s* does that. You don't need to force it to start at the beginning of a line, you're not saving it anyway -- its only purpose is to match up to just before a non-whitespace character.

So now you know that you're looking at non-whitespace, and need to capture up to the last non-whitespace on the same line. Since . won't match newline, .*\S does just that.

One difference from your version is that the initial \s* of the next match gets to eat the trailing whitespace on the line you just matched. Since we no longer care about line endings, the /m modifier is no longer necessary.

You could make the first star possessive (\s*+); that won't change what it matches, but it will make it fail marginally faster at the end of the file if there's a long empty tail.

0

精彩评论

暂无评论...
验证码 换一张
取 消