开发者

Python finding substring between certain characters using regex and replace()

开发者 https://www.devze.com 2023-02-03 04:01 出处:网络
Suppose I have a string with lots of random stuff in it like the following: strJunk =\"asdf2adsf29Value=five&lakl23ljk43asdldl\"

Suppose I have a string with lots of random stuff in it like the following:

strJunk ="asdf2adsf29Value=five&lakl23ljk43asdldl"

And I'm interested in obtaining the substring sitting between 'Value=' and '&', which in this example would be 'five'.

I can use a regex like the following:

 match = re.search(r'Value=?([^&>]+)', strJunk)
 >>> print match.group(0)
 Value=five
 >>> print match.group(1)
 five

How come match.group(0) is the whole thing 'Value=five' and group(1) is just 'five'? And is there a way for me to just get 'five' as the only result? (This questi开发者_如何学运维on stems from me only having a tenuous grasp of regex)

I am also going to have to make a substitution in this string such such as the following:

 val1 = match.group(1)
 strJunk.replace(val1, "six", 1)    

Which yields:

 'asdf2adsf29Value=six&lakl23ljk43asdldl'

Considering that I plan on performing the above two tasks (finding the string between 'Value=' and '&', as well as replacing that value) over and over, I was wondering if there are any other more efficient ways of looking for the substring and replacing it in the original string. I'm fine sticking with what I've got but I just want to make sure that I'm not taking up more time than I have to be if better methods are out there.


Named groups make it easier to get the group contents afterwards. Compiling your regex once, and then reusing the compiled object, will be much more efficient than recompiling it for each use (which is what happens when you call re.search repeatedly). You can use positive lookbehind and lookahead assertions to make this regex suitable for the substitution you want to do.

>>> value_regex = re.compile("(?<=Value=)(?P<value>.*?)(?=&)")
>>> match = value_regex.search(strJunk)
>>> match.group('value')
'five'
>>> value_regex.sub("six", strJunk)
'asdf2adsf29Value=six&lakl23ljk43asdldl'


I'm not exactly sure if you're parsing URLs, in which case, you should be definitely using the urlparse module.

However, given that this is not your question, the ability to split on multiple fields using regular expressions is extremely fast in Python, so you should be able to do what you want as follows:

import re

strJunk ="asdf2adsf29Value=five&lakl23ljk43asdldl"
split_result = re.split(r'[&=]', strJunk)
split_result[1] = 'six'
print "{0}={1}&{2}".format(*split_result)

Hope this helps!

EDIT:

If you will split multiple times, you can use re.compile() to compile the regular expression. So you'll have:

import re
rx_split_on_delimiters = re.compile(r'[&=]')  # store this somewhere

strJunk ="asdf2adsf29Value=five&lakl23ljk43asdldl"
split_result = rx_split_on_delimiters.split(strJunk)
split_result[1] = 'six'
print "{0}={1}&{2}".format(*split_result)


How come match.group(0) is the whole thing 'Value=five' and group(1) is just 'five'? And is there a way for me to just get 'five' as the only result? (This question stems from me only having a tenuous grasp of regex)

I thought that look behind assertion can help you here.

>>> match = re.search(r'(?<=Value=)([^&>]+)', strJunk)
>>> match.group(0)
'five'

but you can only provide a constant length string in look behind assertion.

>>> match = re.search(r'(?<=Value=?)([^&>]+)', strJunk)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/re.py", line 142, in search
    return _compile(pattern, flags).search(string)
  File "/usr/lib/python2.6/re.py", line 245, in _compile
    raise error, v # invalid expression
sre_constants.error: look-behind requires fixed-width pattern

I can't thing of a way to do this without regex. Your way of doing this should be faster than look behind assertion.

0

精彩评论

暂无评论...
验证码 换一张
取 消