开发者

Rails: What's a good way to validate links (URLs)?

开发者 https://www.devze.com 2023-03-30 10:28 出处:网络
I was wondering how I would best validate URLs in Rails. I was thinking of using开发者_高级运维 a regular expression, but am not sure if this is the best practice.

I was wondering how I would best validate URLs in Rails. I was thinking of using开发者_高级运维 a regular expression, but am not sure if this is the best practice.

And, if I were to use a regex, could someone suggest one to me? I am still new to Regex.


Validating an URL is a tricky job. It's also a very broad request.

What do you want to do, exactly? Do you want to validate the format of the URL, the existence, or what? There are several possibilities, depending on what you want to do.

A regular expression can validate the format of the URL. But even a complex regular expression cannot ensure you are dealing with a valid URL.

For instance, if you take a simple regular expression, it will probably reject the following host

http://invalid##host.com

but it will allow

http://invalid-host.foo

that is a valid host, but not a valid domain if you consider the existing TLDs. Indeed, the solution would work if you want to validate the hostname, not the domain because the following one is a valid hostname

http://host.foo

as well the following one

http://localhost

Now, let me give you some solutions.

If you want to validate a domain, then you need to forget about regular expressions. The best solution available at the moment is the Public Suffix List, a list maintained by Mozilla. I created a Ruby library to parse and validate domains against the Public Suffix List, and it's called PublicSuffix.

If you want to validate the format of an URI/URL, then you might want to use regular expressions. Instead of searching for one, use the built-in Ruby URI.parse method.

require 'uri'

def valid_url?(uri)
  uri = URI.parse(uri) && uri.host
rescue URI::InvalidURIError
  false
end

You can even decide to make it more restrictive. For instance, if you want the URL to be an HTTP/HTTPS URL, then you can make the validation more accurate.

require 'uri'

def valid_url?(url)
  uri = URI.parse(url)
  uri.is_a?(URI::HTTP) && !uri.host.nil?
rescue URI::InvalidURIError
  false
end

Of course, there are tons of improvements you can apply to this method, including checking for a path or a scheme.

Last but not least, you can also package this code into a validator:

class HttpUrlValidator < ActiveModel::EachValidator

  def self.compliant?(value)
    uri = URI.parse(value)
    uri.is_a?(URI::HTTP) && !uri.host.nil?
  rescue URI::InvalidURIError
    false
  end

  def validate_each(record, attribute, value)
    unless value.present? && self.class.compliant?(value)
      record.errors.add(attribute, "is not a valid HTTP URL")
    end
  end

end

# in the model
validates :example_attribute, http_url: true


I use a one liner inside my models:

validates :url, format: URI::DEFAULT_PARSER.make_regexp(%w[http https])

I think is good enough and simple to use. Moreover it should be theoretically equivalent to the Simone's method, as it use the very same regexp internally.


Following Simone's idea, you can easily create you own validator.

class UrlValidator < ActiveModel::EachValidator
  def validate_each(record, attribute, value)
    return if value.blank?
    begin
      uri = URI.parse(value)
      resp = uri.kind_of?(URI::HTTP)
    rescue URI::InvalidURIError
      resp = false
    end
    unless resp == true
      record.errors[attribute] << (options[:message] || "is not an url")
    end
  end
end

and then use

validates :url, :presence => true, :url => true

in your model.


There is also validate_url gem (which is just a nice wrapper for Addressable::URI.parse solution).

Just add

gem 'validate_url'

to your Gemfile, and then in models you can

validates :click_through_url, url: true


This question is already answered, but what the heck, I propose the solution I'm using.

The regexp works fine with all urls I've met. The setter method is to take care if no protocol is mentioned (let's assume http://).

And finally, we make a try to fetch the page. Maybe I should accept redirects and not only HTTP 200 OK.

# app/models/my_model.rb
validates :website, :allow_blank => true, :uri => { :format => /(^$)|(^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(([0-9]{1,5})?\/.*)?$)/ix }

def website= url_str
  unless url_str.blank?
    unless url_str.split(':')[0] == 'http' || url_str.split(':')[0] == 'https'
        url_str = "http://" + url_str
    end
  end  
  write_attribute :website, url_str
end

and...

# app/validators/uri_vaidator.rb
require 'net/http'

# Thanks Ilya! http://www.igvita.com/2006/09/07/validating-url-in-ruby-on-rails/
# Original credits: http://blog.inquirylabs.com/2006/04/13/simple-uri-validation/
# HTTP Codes: http://www.ruby-doc.org/stdlib/libdoc/net/http/rdoc/classes/Net/HTTPResponse.html

class UriValidator < ActiveModel::EachValidator
  def validate_each(object, attribute, value)
    raise(ArgumentError, "A regular expression must be supplied as the :format option of the options hash") unless options[:format].nil? or options[:format].is_a?(Regexp)
    configuration = { :message => I18n.t('errors.events.invalid_url'), :format => URI::regexp(%w(http https)) }
    configuration.update(options)

    if value =~ configuration[:format]
      begin # check header response
        case Net::HTTP.get_response(URI.parse(value))
          when Net::HTTPSuccess then true
          else object.errors.add(attribute, configuration[:message]) and false
        end
      rescue # Recover on DNS failures..
        object.errors.add(attribute, configuration[:message]) and false
      end
    else
      object.errors.add(attribute, configuration[:message]) and false
    end
  end
end


The solution that worked for me was:

validates_format_of :url, :with => /\A(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w\.-]*)*\/?\Z/i

I did try to use some of the example that you attached but I'm supporting url like so:

Notice the use of A and Z because if you use ^ and $ you will see this warning security from Rails validators.

 Valid ones:
 'www.crowdint.com'
 'crowdint.com'
 'http://crowdint.com'
 'http://www.crowdint.com'

 Invalid ones:
  'http://www.crowdint. com'
  'http://fake'
  'http:fake'


You can also try valid_url gem which allows URLs without the scheme, checks domain zone and ip-hostnames.

Add it to your Gemfile:

gem 'valid_url'

And then in model:

class WebSite < ActiveRecord::Base
  validates :url, :url => true
end


Just my 2 cents:

before_validation :format_website
validate :website_validator

private

def format_website
  self.website = "http://#{self.website}" unless self.website[/^https?/]
end

def website_validator
  errors[:website] << I18n.t("activerecord.errors.messages.invalid") unless website_valid?
end

def website_valid?
  !!website.match(/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-=\?]*)*\/?$/)
end

EDIT: changed regex to match parameter urls.


I ran into the same problem lately (I needed to validate urls in a Rails app) but I had to cope with the additional requirement of unicode urls (e.g. http://кц.рф)...

I researched a couple of solutions and came across the following:

  • The first and most suggested thing is using URI.parse. Check the answer by Simone Carletti for details. This works ok, but not for unicode urls.
  • The second method I saw was the one by Ilya Grigorik: http://www.igvita.com/2006/09/07/validating-url-in-ruby-on-rails/ Basically, he tries to make a request to the url; if it works, it is valid...
  • The third method I found (and the one I prefer) is an approach similar to URI.parse but using the addressable gem instead of the URI stdlib. This approach is detailed here: http://rawsyntax.com/blog/url-validation-in-rails-3-and-ruby-in-general/


Here is an updated version of the validator posted by David James. It has been published by Benjamin Fleischer. Meanwhile, I pushed an updated fork which can be found here.

require 'addressable/uri'

# Source: http://gist.github.com/bf4/5320847
# Accepts options[:message] and options[:allowed_protocols]
# spec/validators/uri_validator_spec.rb
class UriValidator < ActiveModel::EachValidator

  def validate_each(record, attribute, value)
    uri = parse_uri(value)
    if !uri
      record.errors[attribute] << generic_failure_message
    elsif !allowed_protocols.include?(uri.scheme)
      record.errors[attribute] << "must begin with #{allowed_protocols_humanized}"
    end
  end

private

  def generic_failure_message
    options[:message] || "is an invalid URL"
  end

  def allowed_protocols_humanized
    allowed_protocols.to_sentence(:two_words_connector => ' or ')
  end

  def allowed_protocols
    @allowed_protocols ||= [(options[:allowed_protocols] || ['http', 'https'])].flatten
  end

  def parse_uri(value)
    uri = Addressable::URI.parse(value)
    uri.scheme && uri.host && uri
  rescue URI::InvalidURIError, Addressable::URI::InvalidURIError, TypeError
  end

end

...

require 'spec_helper'

# Source: http://gist.github.com/bf4/5320847
# spec/validators/uri_validator_spec.rb
describe UriValidator do
  subject do
    Class.new do
      include ActiveModel::Validations
      attr_accessor :url
      validates :url, uri: true
    end.new
  end

  it "should be valid for a valid http url" do
    subject.url = 'http://www.google.com'
    subject.valid?
    subject.errors.full_messages.should == []
  end

  ['http://google', 'http://.com', 'http://ftp://ftp.google.com', 'http://ssh://google.com'].each do |invalid_url|
    it "#{invalid_url.inspect} is a invalid http url" do
      subject.url = invalid_url
      subject.valid?
      subject.errors.full_messages.should == []
    end
  end

  ['http:/www.google.com','<>hi'].each do |invalid_url|
    it "#{invalid_url.inspect} is an invalid url" do
      subject.url = invalid_url
      subject.valid?
      subject.errors.should have_key(:url)
      subject.errors[:url].should include("is an invalid URL")
    end
  end

  ['www.google.com','google.com'].each do |invalid_url|
    it "#{invalid_url.inspect} is an invalid url" do
      subject.url = invalid_url
      subject.valid?
      subject.errors.should have_key(:url)
      subject.errors[:url].should include("is an invalid URL")
    end
  end

  ['ftp://ftp.google.com','ssh://google.com'].each do |invalid_url|
    it "#{invalid_url.inspect} is an invalid url" do
      subject.url = invalid_url
      subject.valid?
      subject.errors.should have_key(:url)
      subject.errors[:url].should include("must begin with http or https")
    end
  end
end

Please notice that there are still strange HTTP URIs that are parsed as valid addresses.

http://google  
http://.com  
http://ftp://ftp.google.com  
http://ssh://google.com

Here is a issue for the addressable gem which covers the examples.


I use a slight variation on lafeber solution above. It disallows consecutive dots in the hostname (such as for instance in www.many...dots.com):

%r"\A(https?://)?[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]{2,6}(/.*)?\Z"i

URI.parse seems to mandate scheme prefixing, which in some cases is not what you may want (e.g. if you want to allow your users to quickly spell URLs in forms such as twitter.com/username)


I have been using the 'activevalidators' gem and it's works pretty well (not just for urls validation)

you can find it here

It's all documented but basically once the gem added you'll want to add the following few lines in an initializer say : /config/environments/initializers/active_validators_activation.rb

# Activate all the validators
ActiveValidators.activate(:all)

(Note : you can replace :all by :url or :whatever if you just want to validate specific types of values)

And then back in your model something like this

class Url < ActiveRecord::Base
   validates :url, :presence => true, :url => true
end

Now Restart the server and that should be it


If you want simple validation and a custom error message:

  validates :some_field_expecting_url_value,
            format: {
              with: URI.regexp(%w[http https]),
              message: 'is not a valid URL'
            }


I liked to monkeypatch the URI module to add the valid? method

inside config/initializers/uri.rb

module URI
  def self.valid?(url)
    uri = URI.parse(url)
    uri.is_a?(URI::HTTP) && !uri.host.nil?
  rescue URI::InvalidURIError
    false
  end
end


You can validate multiple urls using something like:

validates_format_of [:field1, :field2], with: URI.regexp(['http', 'https']), allow_nil: true


https://github.com/perfectline/validates_url is a nice and simple gem that will do pretty much everything for you


Recently I had this same issue and I found a work around for valid urls.

validates_format_of :url, :with => URI::regexp(%w(http https))
validate :validate_url
def validate_url

  unless self.url.blank?

    begin

      source = URI.parse(self.url)

      resp = Net::HTTP.get_response(source)

    rescue URI::InvalidURIError

      errors.add(:url,'is Invalid')

    rescue SocketError 

      errors.add(:url,'is Invalid')

    end



  end

The first part of the validate_url method is enough to validate url format. The second part will make sure the url exists by sending a request.


And as a module

module UrlValidator
  extend ActiveSupport::Concern
  included do
    validates :url, presence: true, uniqueness: true
    validate :url_format
  end

  def url_format
    begin
      errors.add(:url, "Invalid url") unless URI(self.url).is_a?(URI::HTTP)
    rescue URI::InvalidURIError
      errors.add(:url, "Invalid url")
    end
  end
end

And then just include UrlValidator in any model that you want to validate url's for. Just including for options.


URL validation cannot be handled simply by using a Regular Expression as the number of websites keep growing and new domain naming schemes keep coming up.

In my case, I simply write a custom validator that checks for a successful response.

class UrlValidator < ActiveModel::Validator
  def validate(record)
    begin
      url = URI.parse(record.path)
      response = Net::HTTP.get(url)
      true if response.is_a?(Net::HTTPSuccess)   
    rescue StandardError => error
      record.errors[:path] << 'Web address is invalid'
      false
    end  
  end
end

I am validating the path attribute of my model by using record.path. I am also pushing the error to the respective attribute name by using record.errors[:path].

You can simply replace this with any attribute name.

Then on, I simply call the custom validator in my model.

class Url < ApplicationRecord

  # validations
  validates_presence_of :path
  validates_with UrlValidator

end


You could use regex for this, for me works good this one:

(^|[\s.:;?\-\]<\(])(ftp|https?:\/\/[-\w;\/?:@&=+$\|\_.!~*\|'()\[\]%#,]+[\w\/#](\(\))?)(?=$|[\s',\|\(\).:;?\-\[\]>\)])


URI::regexp(%w[http https]) is obsolete and should not be used.

Instead, use URI::DEFAULT_PARSER.make_regexp(%w[http https])


Keep it simple:

validates :url, format: %r{http(s)://.+}


If you want to validate HTTPS you can use:

require "uri"

class HttpsUrlValidator < ActiveModel::EachValidator
  def validate_each(record, attribute, value)
    unless valid_url?(value)
      record.errors[attribute] << "is not a valid URL"
    end
  end

  private

  def valid_url?(url)
    uri = URI.parse(url)
    uri.is_a?(URI::HTTPS) && !uri.host.nil?
  rescue URI::InvalidURIError
    false
  end
end

Usage in the model like this:

validates :website_url, presence: true, https_url: true
0

精彩评论

暂无评论...
验证码 换一张
取 消