开发者

Need help with Regular Expression for nine digit alphanumeric with minimum one space boundary

开发者 https://www.devze.com 2023-03-02 22:12 出处:网络
I\'m trying to match a CUSIP number. I have the following, but it is missing some edge cases. \\s[A-Za-z0-9]{9}\\s

I'm trying to match a CUSIP number. I have the following, but it is missing some edge cases.

\s[A-Za-z0-9]{9}\s

I need to omit strings which contain a space in the middle and I need it to match strings which may be bordered by some other text. My strings are generally surrounded by tabs, but it开发者_如何转开发 may be as little as one space char separating the CUSIP from other text. Thanks in advance, I'm pretty green with regex. P.S. I'm working in .NET

Example

"[TAB]123456789[TAB]" should be matched (I'm getting this now)

"sometext[TAB]123456789[TAB]sometext" should be matched (this is not currently being returned)

"some text" should not be returned (I am currently getting this kind of match)


The other answers are wrong, not taking into account PPNs and allowing the check digit to be a letter. Therefore, here's a better solution.

Based on this document and this document, the CUSIPs have the following rules:

  • Length is 9 characters.
  • Characters 1, 2, 3 are digits.
  • Characters 4, 5, 6, 7, 8 are either letters or digits.
  • Characters 6, 7, 8 can also be *, @, #.
  • Character 9 is a check digit.

With this in mind, the following regex should provide a tight match:

^[0-9]{3}[a-zA-Z0-9]{2}[a-zA-Z0-9*@#]{3}[0-9]$

You can play around with it here.

Note that this is as tight as possible without diving into too much details, which would turn the expression into a monster. I suggest you use the check digit algorithm to fully validate the CUSIP, which you can find here.


According to this page, not just any 9-digit alphanumeric is a valid CUSIP. The first three characters can only be digits, and the ninth is a checksum So if you want to distinguish CUSIPs from other 9-character strings, I believe this should work better:

\s[0-9]{3}[a-zA-Z0-9]{6}\s

or, if you also want to match strings that are bordered by the beginning or end of input:

(^|\s)[0-9]{3}[a-zA-Z0-9]{6}(\s|$)

or, if you also want to match strings that are bordered by punctuation (such as "(100ABCDEF)":

(^|[^a-zA-Z0-9])[0-9]{3}[a-zA-Z0-9]{6}([^a-zA-Z0-9]|$)

I believe that should be a 99% solution, but if you want to be really robust you might also want to look into using the 9th (parity) character to verify that the strings are valid.


string haystack = "some 123456789 text";//single space separators

string haystack2 = "some\t123456789\ttext";//tab separators

// The comment is correct, your pattern was correct originally.  
// This is just slightly dressed up.
string pattern = @"(\s+)(?<cusip>[A-Za-z0-9]{9})(\s+)";

Match m = Regex.Match(haystack, pattern);

Console.WriteLine("Match for cusip surrounded by spaces:" + m.Groups["cusip"]);
//Output: Match for cusip surrounded by spaces:123456789

Match m2 = Regex.Match(haystack2, pattern);

Console.WriteLine("Match for cusip surrounded by tabs:" + m2.Groups["cusip"]);
//Output: Match for cusip surrounded by tabs:123456789


    public Boolean CusipValidation(string sCusip)
    {
        string Cusippattern = @"^([0-9]){3}([a-zA-Z0-9]){6}$";

        if (!System.Text.RegularExpressions.Regex.IsMatch(sCusip, Cusippattern, System.Text.RegularExpressions.RegexOptions.IgnoreCase) && sCusip != string.Empty)
            return false;
        else
            return true;
    }
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号