What I'm trying to achieve here is lets say we have two example URLs:
url1 = "http://emy.dod.com/kaskaa/dkaiad/amaa//////////"
url2 = "http://www.example.com/"
How can I extract the striped down URLs?
url1 = "http://emy.dod.com/kaskaa/dkaiad/amaa"
url2 = "http://http://www.example.com"
URI.parse
in Ruby sanitizes certain type of malformed URL but is ineffective in thi开发者_如何学Pythons case.
If we use regex then /^(.*)\/$/
removes a single slash /
from url1
and is ineffective for url2
.
Is anybody aware of how to handle this type of URL parsing?
The point here is I don't want my system to have http://www.example.com/
and http://www.example.com
being treated as two different URLs. And same goes for http://emy.dod.com/kaskaa/dkaiad/amaa////
and http://emy.dod.com/kaskaa/dkaiad/amaa/
.
If you just need to remove all slashes from the end of the url string then you can try the following regex:
"http://emy.dod.com/kaskaa/dkaiad/amaa//////////".sub(/(\/)+$/,'')
"http://www.example.com/".sub(/(\/)+$/,'')
/(\/)+$/
- this regex finds one or more slashes at the end of the string. Then we replace this match with empty string.
Hope this helps.
Although this thread is a bit old and the top answer is quite good, but I suggest another way to do this:
/^(.*?)\/$/
You could see it in action here: https://regex101.com/r/vC6yX1/2
The magic here is *?
, which does a lazy match. So the entire expression could be translated as:
Match as few characters as it can and capture it, while match as many slashes as it can at the end.
Which means, in a more plain English, removes all trailing slashes.
def without_trailing_slash path
path[ %r(.*[^/]) ]
end
path = "http://emy.dod.com/kaskaa/dkaiad/amaa//////////"
puts without_trailing_slash path # "http://emy.dod.com/kaskaa/dkaiad/amaa"
精彩评论