开发者

Ruby/Rails Parsing Emails

开发者 https://www.devze.com 2023-02-15 05:30 出处:网络
I\'m currently using the follo开发者_开发百科wing to parse emails: def parse_emails(emails) valid_emails, invalid_emails = [], []

I'm currently using the follo开发者_开发百科wing to parse emails:

  def parse_emails(emails)
    valid_emails, invalid_emails = [], []
    unless emails.nil?
      emails.split(/, ?/).each do |full_email|
        unless full_email.blank?
          if full_email.index(/\<.+\>/)
            email = full_email.match(/\<.*\>/)[0].gsub(/[\<\>]/, "").strip
          else
            email = full_email.strip
          end
          email = email.delete("<").delete(">")
          email_address = EmailVeracity::Address.new(email)
          if email_address.valid?
            valid_emails << email 
          else
            invalid_emails << email
          end
        end
      end                    
    end
    return valid_emails, invalid_emails
  end

The problem I'm having is given an email like:

Bob Smith <bob@smith.com>

The code above is delete Bob Smith and only returning bob@smith.

But what I want is an hash of FNAME, LNAME, EMAIL. Where fname and lname are optional but email is not.

What type of ruby object would I use for that and how would I create such a record in the code above?

Thanks


I've coded so that it will work even if you have an entry like: John Bob Smith Doe <bob@smith.com>

It would retrieve:

{:email => "bob@smith.com", :fname => "John", :lname => "Bob Smith Doe" }

def parse_emails(emails)
  valid_emails, invalid_emails = [], []
  unless emails.nil?
    emails.split(/, ?/).each do |full_email|
      unless full_email.blank?
        if index = full_email.index(/\<.+\>/)
          email = full_email.match(/\<.*\>/)[0].gsub(/[\<\>]/, "").strip
          name  = full_email[0..index-1].split(" ")
          fname = name.first
          lname = name[1..name.size] * " "
        else
          email = full_email.strip
          #your choice, what the string could be... only mail, only name?
        end
        email = email.delete("<").delete(">")
        email_address = EmailVeracity::Address.new(email)

        if email_address.valid?
          valid_emails << { :email => email, :lname => lname, :fname => fname} 
        else
          invalid_emails << { :email => email, :lname => lname, :fname => fname}
        end
      end
    end                    
  end
  return valid_emails, invalid_emails 
end


Here's a slightly different approach that works better for me. It grabs the name whether it is before or after the email address and whether or not the email address is in angle brackets.

I don't try to parse the first name out from the last name -- too problematic (e.g. "Mary Ann Smith" or Dr. Mary Smith"), but I do eliminate duplicate email addresses.

def parse_list(list)
  r = Regexp.new('[a-z0-9\.\_\%\+\-]+@[a-z0-9\.\-]+\.[a-z]{2,4}', true)
  valid_items, invalid_items = {}, []

  ## split the list on commas and/or newlines
  list_items = list.split(/[,\n]+/)

  list_items.each do |item|
    if m = r.match(item)
      ## get the email address
      email = m[0]
      ## get everything before the email address
      before_str = item[0, m.begin(0)]
      ## get everything after the email address
      after_str = item[m.end(0), item.length]
      ## enter the email as a valid_items hash key (eliminating dups)
      ## make the value of that key anything before the email if it contains
      ## any alphnumerics, stripping out any angle brackets
      ## and leading/trailing space   
      if /\w/ =~ before_str
        valid_items[email] = before_str.gsub(/[\<\>\"]+/, '').strip
      ## if nothing before the email, make the value of that key anything after
      ##the email, stripping out any angle brackets and leading/trailing space 
      elsif /\w/ =~ after_str
        valid_items[email] = after_str.gsub(/[\<\>\"]+/, '').strip
      ## if nothing after the email either,
      ## make the value of that key an empty string
      else
        valid_items[email] = ''
      end
    else
      invalid_items << item.strip if item.strip.length > 0
    end
  end

  [valid_items, invalid_items]
end

It returns a hash with valid email addresses as keys and the associated names as values. Any invalid items are returned in the invalid_items array.

See http://www.regular-expressions.info/email.html for an interesting discussion of email regexes.

I made a little gem out of this in case it might be useful to someone at https://github.com/victorgrey/email_addresses_parser


You can use rfc822 gem. It contains regular expression for seeking for emails that conform with RFC. You can easily extend it with parts for finding first and last name.


Along the lines of mspanc's answer, you can use the mail gem to do the basic email address parsing work for you, as answered here: https://stackoverflow.com/a/12187502/1019504

0

精彩评论

暂无评论...
验证码 换一张
取 消