How can I extract whatever follows the last slash in a URL in Python? For example, these URLs should return the following:
URL: http://www.test.com/TEST1
returns: TEST1
URL: http://www.test.com/page/TEST2
returns: TEST2
URL: http://www.test.com/page/page/12345
returns: 12345
I've tried urlparse, but that gives me the full path filename, such as page开发者_JS百科/page/12345
.
You don't need fancy things, just see the string methods in the standard library and you can easily split your url between 'filename' part and the rest:
url.rsplit('/', 1)
So you can get the part you're interested in simply with:
url.rsplit('/', 1)[-1]
One more (idio(ma)tic) way:
URL.split("/")[-1]
rsplit
should be up to the task:
In [1]: 'http://www.test.com/page/TEST2'.rsplit('/', 1)[1]
Out[1]: 'TEST2'
urlparse is fine to use if you want to (say, to get rid of any query string parameters).
import urllib.parse
urls = [
'http://www.test.com/TEST1',
'http://www.test.com/page/TEST2',
'http://www.test.com/page/page/12345',
'http://www.test.com/page/page/12345?abc=123'
]
for i in urls:
url_parts = urllib.parse.urlparse(i)
path_parts = url_parts[2].rpartition('/')
print('URL: {}\nreturns: {}\n'.format(i, path_parts[2]))
Output:
URL: http://www.test.com/TEST1
returns: TEST1
URL: http://www.test.com/page/TEST2
returns: TEST2
URL: http://www.test.com/page/page/12345
returns: 12345
URL: http://www.test.com/page/page/12345?abc=123
returns: 12345
You can do like this:
head, tail = os.path.split(url)
Where tail will be your file name.
os.path.basename(os.path.normpath('/folderA/folderB/folderC/folderD/'))
>>> folderD
Here's a more general, regex way of doing this:
re.sub(r'^.+/([^/]+)$', r'\1', url)
First extract the path element from the URL:
from urllib.parse import urlparse
parsed= urlparse('https://www.dummy.example/this/is/PATH?q=/a/b&r=5#asx')
and then you can extract the last segment with string functions:
parsed.path.rpartition('/')[2]
(example resulting to 'PATH'
)
Use urlparse
to get just the path and then split the path you get from it on /
characters:
from urllib.parse import urlparse
my_url = "http://example.com/some/path/last?somequery=param"
last_path_fragment = urlparse(my_url).path.split('/')[-1] # returns 'last'
Note: if your url ends with a /
character, the above will return ''
(i.e. the empty string). If you want to handle that case differently, you need to strip the last trailing /
character before you split the path:
my_url = "http://example.com/last/"
# handle URL ending in `/` by removing it.
last_path_fragment = urlparse(my_url).path.rstrip('/', 1).split('/')[-1] # returns 'last'
The following solution, which uses pathlib
to parse the path obtained from urllib.parse
allows to get the last part even when a terminal slash is present:
import urllib.parse
from pathlib import Path
urls = [
"http://www.test.invalid/demo",
"http://www.test.invalid/parent/child",
"http://www.test.invalid/terminal-slash/",
"http://www.test.invalid/query-params?abc=123&works=yes",
"http://www.test.invalid/fragment#70446893",
"http://www.test.invalid/has/all/?abc=123&works=yes#70446893",
]
for url in urls:
url_path = Path(urllib.parse.urlparse(url).path)
last_part = url_path.name # use .stem to cut file extensions
print(f"{last_part=}")
yields:
last_part='demo'
last_part='child'
last_part='terminal-slash'
last_part='query-params'
last_part='fragment'
last_part='all'
extracted_url = url[url.rfind("/")+1:];
Split the url and pop the last element
url.split('/').pop()
Split the URL and pop the last element
const plants = ['broccoli', 'cauliflower', 'cabbage', 'kale', 'tomato'];
console.log(plants.pop());
// expected output: "tomato"
console.log(plants);
// expected output: Array ["broccoli", "cauliflower", "cabbage", "kale"]
if you want to get the path only, not the query params or hash:
new URL(document.URL).pathname.split('/').reverse()[0];
url ='http://www.test.com/page/TEST2'.split('/')[4]
print url
Output: TEST2
.
精彩评论