开发者

Regex with Tab delimited text containing \x09

开发者 https://www.devze.com 2022-12-09 13:39 出处:网络
I\'ve got a tough one. I\'ve got tab-delimited text to match with a regex. My regex looks like: ^([\\w ]+)\\t(\\d*)\\t(\\d+)\\t([^\\t]+)\\t([^\\t]+)\\t([^\\t]+)\\t([^\\t]+)$

I've got a tough one.

I've got tab-delimited text to match with a regex.

My regex looks like:

^([\w ]+)\t(\d*)\t(\d+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)$

and an example source text is (tabs converted to \t for clarity):

JJ\t345\t0\tTest\tSome test text\tmore text: pcre:"/\x20\x62\x3b\x0a\x09\x61\x2e\x53\x74\x61\x72/"\tNone

However, the problem开发者_如何学Python is that in my source text, the 6th field contains a regex string. Therefore, it can contain \x09, which naturally blows up the regex since it's seen as a tab as well.

Is there any way to tell the regex engine, "Match on \t but not on the text \x09." My guess is no, since they're the same thing.

If not, is there any character that could be safely used for delimiting text that contains a regex string?


I would recommend encoding all of the characters in the pcre string prior to running the regular expression against it.


Seems like a problem with the test case. A regex might have tabs in it, but your sample above doesn't. Your string in Java would look like:

String testString = "JJ\t345\t0\tTest\tSome test text\tmore text: pcre:"/\\x20\\x62\\x3b\\x0a\\x09\\x61\\x2e\\x53\\x74\\x61\\x72/"\tNone";

If you look at this string in the debugger you'll have \x09 as 4 characters instead of as 1 (the tab).

0

精彩评论

暂无评论...
验证码 换一张
取 消