开发者

Filename tricks in python

开发者 https://www.devze.com 2023-04-05 18:04 出处:网络
I have a folder containing my books that are in various formats (.pdf, .djvu, .dvi). and they all follow the format:

I have a folder containing my books that are in various formats (.pdf, .djvu, .dvi). and they all follow the format:

[Name of the Book] - [Author].[an identifier indicating if 开发者_JAVA百科it searchable or not].[filetype]

I want to make a list of my books that is of the format (x,y,z,t) where x is the name of the book, y is the author, etc. My problem is that when I do:

for file in os.listdir('/home/username/Books'):

file is a string, thus immutable, so I cannot change it.


Strings are immutable, but that doesn't mean you can't create the tuple you need from the string.

Something like this should work:

def file_to_tuple(file):
    title_author, searchable, ext = file.rsplit('.', 2)
    title, author = title_author.rsplit(' - ', 1)
    return (title, author, searchable, ext)

You can then use this in a variety of ways to convert your file list to a list of tuples, here are a couple of options:

book_list = map(file_to_tuple, os.listdir('/home/username/Books'))

book_list = [file_to_tuple(f) for f in os.listdir('/home/username/Books')]

str.rsplit() with the maxsplit param is used so that it will not fail for titles that contain a period or a dash, or authors that contain a period, for example:

>>> file_to_tuple('Narnia - The Silver Chair - C.S. Lewis.1.pdf')
('Narnia - The Silver Chair', 'C.S. Lewis', '1', 'pdf')


Use string.split to break this into the required parts.


You are not wanting to change the string, so the fact that it is immutable is irrelavent. You can still make new items from it.

Here's a small function (tested, even ;) that does what you want:

def book_tuple(info):
    book_author, searchable, ext = info.rsplit('.', 2)
    book, author = book_author.rsplit(' - ', 1)
    return book, author, searchable, ext

book_list = []
for filename in os.listdir('/home/username/Books'):
    book_list.append(book_tuple(filename))

The first split uses .rsplit() with 2 so that it splits at most two times (in case there are periods in the title or author name) and starts from the end (again, in case there are periods in the title or author name). The second split does the same, with a max split of 1 (for the same reasons).


That's not a problem, since you don't want to change it. You want to extract pieces of it into new strings.

One simple way might be something like this:

top = file.split(" - ")
name = top[0]
fields = top[1].split(".")
author = fields[0]
searchable = fields[1]
filetype = fields[2]

my_books.append((name, author, searchable, filetype)

This just builds a flat list in my_books, but you could of course do something more clever.


file is a string, thus immutable, so I cannot change it.

So? What do you want to change?

You want to parse it into pieces on different punctuation marks.

You want to create new strings from an existing string. Nothing "changes".

You have split() and partition(), both of which will get much of your job done.

Immutability of a string is totally irrelevant.

0

精彩评论

暂无评论...
验证码 换一张
取 消