开发者

regexp and escaped char in scheme

开发者 https://www.devze.com 2023-03-07 16:12 出处:网络
in scheme, there is \"hello hellu-#\\\"hella.helloo,hallo#\\return#\\\"\" string I want to list them as (\"hello\" \"hellu\" \"h开发者_C百科ella\" \"helloo\" \"hallo\")

in scheme,

there is "hello hellu-#\"hella.helloo,hallo#\return#\"" string

I want to list them as ("hello" "hellu" "h开发者_C百科ella" "helloo" "hallo")

separate by space, hyphen, double quote, dot, comma, return

I tried

(regexp-split #rx"( +)|(#\-)|(#\")|(#\.)|(,)|(#\return)" string)

but #\- , #\. make error

any hint or solution?

thanks


It looks like you're confusing the syntax for characters (#\foo) with the syntax for strings, and you do that in both the string and the regexp. So my guess is that the string that you want to split is actually:

"hello hellu-\"hella.helloo,hallo\n\""

where \" stands for a double quote character, and \n for a newline. If this is the case, then (again, this is guessing your intention) the regexp should be:

(regexp-split #rx"( +)|(\-)|(\")|(\.)|(,)|(\n)" string)

But that doesn't work either, since \- and \. are invalid escapes (Racket uses C-like escapes), so change that to:

(regexp-split #rx"( +)|(-)|(\")|(.)|(,)|(\n)" string)

This doesn't work either, since . has the usual "any char" meaning in a regexp -- so you want to escape it with a backslash. As with many other string syntaxes, you get a backslash by escaping it with a backslash, so now we have a version that is finally close to a working one:

> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-split #rx"( +)|(-)|(\")|(\\.)|(,)|(\n)" string)
'("hello" "hellu" "" "hella" "helloo" "hallo" "" "")

First, the regexp can be improved considerably: the parens are not needed for splitting:

(regexp-split #rx" +|-|\"|\\.|,|\n" string)

Then, instead of using a bunch of single-characters with |s, you can just use a "character range":

(regexp-split #rx" +|[-\".,\n]" string)

Note that it's important that the - is the first (or last) character in the range, so it will not have the usual meaning of a range of characters. Next, it seems that you really want any sequence of such characters to be used as a separator, which will avoid some of those empty strings in the result:

(regexp-split #rx" +|[-\".,\n]+" string)

and in this case you can just as well throw the space into the range too (carefully putting it after the -, as I explained above). We now get:

> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-split #rx"[- \".,\n]+" string)
'("hello" "hellu" "hella" "helloo" "hallo" "")

And finally you'd probably want to get rid of that last empty string. Technically, it should be there, since there is a sequence of matching characters before the end of the string. An easy way in Racket around this is to use the complementary regexp-match* which returns the list of matches rather than splitting on the list of matches:

> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-match* #rx"[- \".,\n]+" string)
'(" " "-\"" "." "," "\n\"")

This is obviously broken, since it gives you the separators rather than what's between them. But since this regexp is a range of characters, it is easy to resolve -- simply negate the character range, and you get what you want:

> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-match* #rx"[^- \".,\n]+" string)
'("hello" "hellu" "hella" "helloo" "hallo")
0

精彩评论

暂无评论...
验证码 换一张
取 消