开发者

How to extract valid email from larger string in Scala

开发者 https://www.devze.com 2022-12-30 17:53 出处:网络
My scala version 2.7.7 Im trying to extract an email adress from a larger string. the string itself foll开发者_StackOverflowows no format. the code i\'ve got:

My scala version 2.7.7

Im trying to extract an email adress from a larger string. the string itself foll开发者_StackOverflowows no format. the code i've got:

import scala.util.matching.Regex
import scala.util.matching._
val Reg = """\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b""".r
"yo my name is joe : joe@gmail.com" match {
    case Reg(e) => println("match: " + e)
    case _ => println("fail")
}

the Regex passes in RegExBuilder but does not pass for scala. Also if there is another way to do this without regex that would be fine also. Thanks!


As Alan Moore pointed out, you need to add the (?i) to the beginning of the pattern to make it case-insensitive. Also note that using the Regex directly matches the whole string. If you want to find one within a larger string, you can call findFirstIn() or use one of the similar methods of Regex.

val reg = """(?i)\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b""".r
reg findFirstIn "yo my name is joe : joe@gmail.com"  match {
    case Some(email) => println("match: " + email)
    case None => println("fail")
}


It looks like you're trying to do a case-insensitive search, but you aren't specifying that anywhere. Try adding (?i) to the beginning of the regex:

"""(?i)\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b""".r


Well, the ways to do it other than REs are probably a lot messier. The next step up would probably the a combinator parser. A lot of random string dissection code would be even more general and almost certainly a whole lot more painful. In part what's a suitable tactic depends on how complete (and how strict or lenient) your recognizer needs to be. E.g., the common form: Rudolf Reindeer <rudy.caribou@north_pole.rth> is not accepted by your RE (even after the case-sensitivity is relaxed). Full-blown RFC 2822 address parsing is rather challenging for an RE-based approach.

0

精彩评论

暂无评论...
验证码 换一张
取 消