开发者

Inconsistent behavior between str_split and strsplit

开发者 https://www.devze.com 2023-04-03 17:04 出处:网络
The documentation for str_split in the stringr package states that for the pattern argument: If \"\" splits into individual characters.

The documentation for str_split in the stringr package states that for the pattern argument:

If "" splits into individual characters.

which suggests it behaves the same as strsplit in this regard. However,

library(stringr)
str_split("abcab","")
[[1]]
[1] ""  "a" "b" "c" "a" "b"

with a leading empty string. Thi开发者_高级运维s compares with,

strsplit("abcab","")
[[1]]
[1] "a" "b" "c" "a" "b"

Leading empty strings seems to be normal behavior when splitting on non-empty strings,

strsplit("abcab","ab")
[[1]]
[1] ""  "c"

but even then, str_split generates an 'extra' trailing empty string:

str_split("abcab","ab")
[[1]]
[1] ""  "c" "" 

Is this discrepancy a bug, feature, an error in the documentation or just a different notion of what's 'expected behavior'?


If you use commas as delimiters, the "expected" (your mileage may vary) result is more obvious:

# expect "" "2" "3" "4" ""

strsplit(",2,3,4,", ",")
# [[1]]
# [1] ""  "2" "3" "4"

str_split(",2,3,4,", ",")
# [[1]]
# [1] ""  "2" "3" "4" "" 

If I have n commas then I expect (n+1) elements to be returned. So I prefer the results from str_split. However, I wouldn't necessarily call this a bug in strsplit, since in performs as advertised:

(from ?strplit) Note that this means that if there is a match at the beginning of a (non-empty) string, the first element of the output is ‘""’, but if there is a match at the end of the string, the output is the same as with the match removed.

"" is trickier, as there is no way to count the number of times "" appears in a string. Therefore treating it as a special case seems justified.

(from ?str_split) If ‘""’ splits into individual characters.

Based on this I suggest you have found a bug and should take hadley's advice and report it!

0

精彩评论

暂无评论...
验证码 换一张
取 消