开发者

regex - return all before the second occurrence

开发者 https://www.devze.com 2023-04-05 23:33 出处:网络
Given this string: DNS000001320_309.0/121.0_t0 How would I return everything before the second occurrence of \"_\"?

Given this string:

DNS000001320_309.0/121.0_t0

How would I return everything before the second occurrence of "_"?

DNS000001320_309.0/121.0
开发者_Go百科

I am using R.

Thanks.


The following script:

s <- "DNS000001320_309.0/121.0_t0"
t <- gsub("^([^_]*_[^_]*)_.*$", "\\1", s)
t

will print:

DNS000001320_309.0/121.0

A quick explanation of the regex:

^         # the start of the input
(         # start group 1
  [^_]*   #   zero or more chars other than `_`
  _       #   a literal `_`
  [^_]*   #   zero or more chars other than `_`
)         # end group 1
_         # a literal `_`
.*        # consume the rest of the string
$         # the end of the input

which is replaced with:

\\1       # whatever is matched in group 1

And if there are less than 2 underscores, the string is not changed.


I think this might do the task (regex to match everything befor the last occurence of _):

_([^_]*)$

E.g.:

> sub('_([^_]*)$', '', "DNS000001320_309.0/121.0_t0")
[1] "DNS000001320_309.0/121.0"


Personally, I hate regex, so luckily there's a way to do this without them, just by splitting the string:

> s <- "DNS000001320_309.0/121.0_t0"      
> paste(strsplit(s,"_")[[1]][1:2],collapse = "_")
[1] "DNS000001320_309.0/121.0"

Although of course this assumes that there will always be at least 2 underscores in your string, so be careful if you vectorize this and that isn't the case.


not pretty but this will do the trick

mystr <- "DNS000001320_309.0/121.0_t0"

mytok <- paste(strsplit(mystr,"_")[[1]][1:2],collapse="_")
0

精彩评论

暂无评论...
验证码 换一张
取 消