开发者

python regex to find any link that contains the text 'abc123'

开发者 https://www.devze.com 2023-01-10 09:40 出处:网络
I am using beautifuly soup to find all href tags. links = myhtml.findAll(\'a\', href=re.compile(\'????\'))

I am using beautifuly soup to find all href tags.

links = myhtml.findAll('a', href=re.compile('????'))

I need to find all li开发者_StackOverflow社区nks that have 'abc123' in the href text.

I need help with the regex , see ??? in my code snippet.


If 'abc123' is literally what you want to search for, anywhere in the href, then re.compile('abc123') as suggested by other answers is correct. If the actual string you want to match contains punctuation, e.g. 'abc123.com', then use instead

re.compile(re.escape('abc123.com'))

The re.escape part will "escape" any punctuation so that it's taken literally, just like alphanumerics are; without it, some punctuation gets interpreted in various ways by RE's engine, for example the dot ('.') in the above example would be taken as "any single character whatsoever", so re.compile('abc123.com') would match, e.g. 'abc123zcom' (and many other strings of a similar nature).


"abc123" should give you what you want

if that doesn't work, than BS is probably using re.match in which case you would want ".*abc123.*"


If you want all the links with exactly 'abc123' you can simply put:

links = myhtml.findAll('a', href=re.compile('abc123'))
0

精彩评论

暂无评论...
验证码 换一张
取 消