开发者

Regular Expression for String representing DNA code

开发者 https://www.devze.com 2023-03-06 01:15 出处:网络
Hello I am try开发者_开发知识库ing to use regular expressions in a java program. I would like the regex to identify a String of unknown length and whose charachters are only \'C\', \'A\', \'G\' or\'T\

Hello I am try开发者_开发知识库ing to use regular expressions in a java program. I would like the regex to identify a String of unknown length and whose charachters are only 'C', 'A', 'G' or 'T'. Thanks for your help.


Easy, just use a character class:

[CAGT]+

Or if the entire string has to comprise of the chars CAGT for it to match:

^[CAGT]+$


Adding to the above :

^[CAGTcagt]+$

To ensure detection of lowercase and upper case charcters.


I disagree with the most voted answer. With [ACGT]+, a large string will lead to a lot of memory usage. So I would use a negated regex instead, and check if the string doesn't contain non [ACGT] characters instead:

str !~ [^ACGTacgt]
0

精彩评论

暂无评论...
验证码 换一张
取 消