开发者

How to split the file content by space and end-of-line character?

开发者 https://www.devze.com 2022-12-11 11:20 出处:网络
When I do the following list comprehension I end up with nested lists: channel_values = [x for x in [ y.split(\' \') for y in

When I do the following list comprehension I end up with nested lists:

channel_values = [x for x in [ y.split(' ') for y in
    open(channel_output_file).readlines() ] if x and not x == '\n']

Basically I have a file composed of this:

7656 7653 7649 7646 7643 7640 7637 7634 7631 7627 7624 7621 7618 7615
8626 8623 8620 8617 8614 8610 8607 8604 8600 8597 8594 8597 8594 4444
<snip several thousand lines>

Where each line of this file is terminated by a new line.

Basically I need to add each number (they are all separated by a single space) int开发者_StackOverflow社区o a list.

Is there a better way to do this via list comprehension?


You don't need list comprehensions for this:

channel_values = open(channel_output_file).read().split()


Just do this:

channel_values = open(channel_output_file).read().split()

split() will split according to whitespace that includes ' ' '\t' and '\n'. It will split all the values into one list.

If you want integer values you can do:

channel_values = map(int, open(channel_output_file).read().split())

or with list comprehensions:

channel_values = [int(x) for x in open(channel_output_file).read().split()]


Also, the reason the original list comprehension had nested lists is because you added an extra level of list comprehension with the inner set of square brackets. You meant this:

channel_values = [x for x in y.split(' ') for y in
    open(channel_output_file) if x and not x == '\n']

The other answers are still better ways to write the code, but that was the cause of the problem.


Well another problem is that you're leaving the file open. Note that open is an alias for file.

try this:

f = file(channel_output_file)
channel_values = f.read().split()
f.close()

Note they'll be string values so if you want integer ones change the second line to

channel_values = [int(x) for x in f.read().split()]

int(x) will throw a ValueError if you have a non integer value in the file.


If you don't care about dangling file references, and you really must have a list read into memory all at once, the one-liner mentioned in other answers does work:

channel_values = open(channel_output_path).read().split()

In production code, I would probably use a generator, why read all those lines if you don't need them?

def generate_values_for_filename(filename):
    with open(filename) as f:
        for line in f:
            for value in line.split():
                yield value

You can always make a list later if you really need to do something other than iterate over values:

channel_values = list(generate_values_for_filename(channel_output_path))


Is there a better way to do this via list comprehension?

Sort of..

Instead of reading each line as an array, with the .readlines() methods, you can just use .read():

channel_values = [x for x in open(channel_output_file).readlines().split(' ')
if x not in [' ', '\n']]

If you need to do anything more complicated, particularly if it involves multiple list-comprehensions, you're almost always better of expanding it into a regular for loop.

out = []
for y in open(channel_output_file).readlines():
    for x in y.split(' '):
        if x not in [' ', '\n']:
            out.append(x)

Or using a for loop and a list-comprehension:

out = []
for y in open(channel_output_file).readlines():
    out.extend(
        [x for x in y.split(' ')
        if x != ' ' and x != '\n'])

Basically, if you can't do something simply with a list comprehension (or need to nest them), list-comprehensions are probably not the best solution.

0

精彩评论

暂无评论...
验证码 换一张
取 消