I have a list of maybe a 100 or so elements that is actually an email with each line as an element. The list is slightly variable because lines that have a \n in them are put in a separate element so I can't simply slice using fixed values. I essentially need a variable start and stop phrase (needs to be a partial search as well because one of my start phrases might actually be Total Cost: $13.43
so I would just use Total Cost:
.) Same thing with the end phrase. I also do not wish to include the start/stop phrases in the returned list. In summary:
>>> email = ['apples','bananas','cats','dogs','elephants','fish','gee']
>>> start = 'ban'
>>> stop = 'ele'
# the magic here
>>> print new_email
['cats', 'dogs']
NOTES
- While not perfect formatting of the email, it is fairly consistent so there is a slim chance a start/stop phrase will occur more than once.
- There are also no blank elements.
SOLUTION
Just for funzies and thanks to everybody's help here is my final code:
def get_elements_positions(stringList=list(), startPhrase=None, stopPhrase=None):
elementPositionStart, elementPositionStop = 0, -1
if startPhrase:
elementPositionStart = next((i for i, j in enumerate(stringList) if j.startswith(startPhrase)), 0)
if stopPhrase:
elementPositionStop = next((i for i, j in enumerate(stringList) if j.startswith(stopPhrase)), -1)
if elementPositionStart + 1 == elementPositionStop - 1:
return elementPositionStart + 1
else:
return [elementPositionStart, elementPositionStop]
It returns a list with the starting and ending element position and defaults to 0 and -1 if the respective value cannot be found. (0 being the first element and -1 being the last).
SOLUTION-B
I made a small change, now if the list is describing a start and stop position resulting in just 1 element between it returns that elements position as an integer in开发者_StackOverflow中文版stead of a list which you still get for multi-line returns.
Thanks again!
>>> email = ['apples','bananas','cats','dogs','elephants','fish','gee']
>>> start, stop = 'ban', 'ele'
>>> ind_s = next(i for i, j in enumerate(email) if j.startswith(start))
>>> ind_e = next(i for i, j in enumerate(email) if j.startswith(stop) and i > ind_s)
>>> email[ind_s+1:ind_e]
['cats', 'dogs']
To satisfy conditions when element might not be in the list:
>>> def get_ind(prefix, prev=-1):
it = (i for i, j in enumerate(email) if i > prev and j.startswith(prefix))
return next(it, None)
>>> start = get_ind('ban')
>>> start = -1 if start is None else start
>>> stop = get_ind('ele', start)
>>> email[start+1:stop]
['cats', 'dogs']
An itertools
based approach:
import itertools
email = ['apples','bananas','cats','dogs','elephants','fish','gee']
start, stop = 'ban', 'ele'
findstart = itertools.dropwhile(lambda item: not item.startswith(start), email)
findstop = itertools.takewhile(lambda item: not item.startswith(stop), findstart)
print list(findstop)[1:]
// ['cats', 'dogs']
Here you go:
>>> email = ['apples','bananas','cats','dogs','elephants','fish','gee']
>>> start = 'ban'
>>> stop = 'ele'
>>> out = []
>>> appending = False
>>> for item in email:
... if appending:
... if stop in item:
... out.append(item)
... break
... else:
... out.append(item)
... elif start in item:
... out.append(item)
... appending = True
...
>>> out.pop(0)
'bananas'
>>> out.pop()
'elephants'
>>> print out
['cats', 'dogs']
I think my version is much more readable than the other answers and doesn't require any imports =)
精彩评论