开发者

java email extraction regular expression?

开发者 https://www.devze.com 2022-12-20 03:30 出处:网络
I开发者_运维问答 would like a regular expression that will extract email addresses from a String (using Java regular expressions).

I开发者_运维问答 would like a regular expression that will extract email addresses from a String (using Java regular expressions).

That really works.


Here's the regular expression that really works. I've spent an hour surfing on the web and testing different approaches, and most of them didn't work although Google top-ranked those pages.

I want to share with you a working regular expression:

[_A-Za-z0-9-]+(\\.[_A-Za-z0-9-]+)*@[A-Za-z0-9]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})

Here's the original link: http://www.mkyong.com/regular-expressions/how-to-validate-email-address-with-regular-expression/


I had to add some dashes to allow for them. So a final result in Javanese:

final String MAIL_REGEX = "([_A-Za-z0-9-]+)(\\.[_A-Za-z0-9-]+)*@[A-Za-z0-9-]+(\\.[A-Za-z0-9-]+)*(\\.[A-Za-z]{2,})";


Install this regex tester plugin into eclipse, and you'd have whale of a time testing regex
http://brosinski.com/regex/.

Points to note:
In the plugin, use only one backslash for character escape. But when you transcribe the regex into a Java/C# string you would have to double them as you would be performing two escapes, first escaping the backslash from Java/C# string mechanism, and then second for the actual regex character escape mechanism.

Surround the sections of the regex whose text you wish to capture with round brackets/ellipses. Then, you could use the group functions in Java or C# regex to find out the values of those sections.

([_A-Za-z0-9-]+)(\.[_A-Za-z0-9-]+)@([A-Za-z0-9]+)(\.[A-Za-z0-9]+)

For example, using the above regex, the following string

abc.efg@asdf.cde

yields

start=0, end=16
Group(0) = abc.efg@asdf.cde
Group(1) = abc
Group(2) = .efg
Group(3) = asdf
Group(4) = .cde

Group 0 is always the capture of whole string matched.

If you do not enclose any section with ellipses, you would only be able to detect a match but not be able to capture the text.

It might be less confusing to create a few regex than one long catch-all regex, since you could programmatically test one by one, and then decide which regexes should be consolidated. Especially when you find a new email pattern that you had never considered before.


a little late but ok.

Here is what i use. Just paste it in the console of FireBug and run it. Look on the webpage for a 'Textarea' (Most likely on the bottom of the page) That will contain a , seperated list of all email address found in A tags.

    var jquery = document.createElement('script');
    jquery.setAttribute('src', 'http://code.jquery.com/jquery-1.10.1.min.js');
    document.body.appendChild(jquery);

    var list = document.createElement('textarea');
    list.setAttribute('emaillist');
    document.body.appendChild(list);
var lijst = "";

    $("#emaillist").val("");
    $("a").each(function(idx,el){
        var mail = $(el).filter('[href*="@"]').attr("href");
        if(mail){
            lijst += mail.replace("mailto:", "")+",";
        }
    });
    $("#emaillist").val(lijst);


The Java 's build-in email address pattern (Patterns.EMAIL_ADDRESS) works perfectly:

    public static List<String> getEmails(@NonNull String input) {
        List<String> emails = new ArrayList<>();
        Matcher matcher = Patterns.EMAIL_ADDRESS.matcher(input);
        while (matcher.find()) {
            int matchStart = matcher.start(0);
            int matchEnd = matcher.end(0);
            emails.add(input.substring(matchStart, matchEnd));
        }
        return emails;
    }
0

精彩评论

暂无评论...
验证码 换一张
取 消