Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 12 years ago.
Improve this questionI just wrote my first Python program and it works! It is a program that will rename subtitle files to their matching video files so that media players will pick up the subtitles (e.g. it will rename "The_Office_3x01.str" to "The.Office.S03E01.str" if a file named "The.Office.S03E01.avi" was present in the directory).
I'd love for someone to comment/criticize my code and help me improve my coding style and make it more Python-y. You can find the code at http://subtitle-renamer.googlecode.com/hg/renombrador.py
As I said, it is my very first program in Python, so feel free to comment on anything like:
- Style (indentation, variable names, conventions, etc.)
- Design
- Python features I should be using that I'm not
- My use of the libraries
Thanks!
1) A hash-bang line only works if it is the first line in the file.
2) Function and variable names are usually underscore_separated rather than camelCase.
3) Docstrings usually use triple-quotes, even for single-line docstrings.
4) Your coding style is very functional-flavored, with many lambdas, maps, reduces, etc, and one instance of a four-line triply-nested list comprehension. I find it difficult to understand that style, and would definitely unroll some of those.
5) As an example of the problem that (4) can cause, you have a few places where you evaluate the same expression more than once because of the list comprehension structure:
match = [regex.match(str) for regex in episodeRegExes if regex.match(str)]
and then
def getEpisodeTuple(iterable):
episodeTuple = [episodeChunk(chunk)
for chunk in iterable
if episodeChunk(chunk)]
if episodeTuple:
assert len(episodeTuple) == 1
return episodeTuple[0]
else:
return None
Notice that the list comprehension here evaluates episodeChunk(chunk) twice, and each of those performs the regex match twice, so in your successful case, you are matching your regex four times.
In this last code, (a) you've called a list a tuple, and (b) you build a list, then assert it only has one element, and return the element. It would be simpler like this:
def getEpisodeTuple(iterable):
for chunk in iterable:
echunk = episodeChunk(chunk)
if echunk:
return echunk
6) Learn more about the standard library. For example, this code:
def splitWithAny(string, delimiters):
"Splits the string with any of the strings of delimiters"
return reduce(
lambda iterable, delim: reduce(
lambda lst,chunk: lst + chunk.split(delim),
iterable,
[]),
delimiters,
[string])
def splitName(fileName):
"Splits the fileName into smaller and hopefully significant chunks"
delimiters = [" ", ".", "_", "-"]
return filter(None, splitWithAny(fileName, delimiters))
I think (I'm not sure, because of the reduce(lambda(reduce(lambda)))) stuff..) can be simplified to:
def splitName(fileName):
return re.split("[ ._-]+", fileName)
if you check it against pep8 you'll find a number of style suggestions (there are automated pep8 checkers available).
also, you can use something like pyLint can be very helpful. The quality of my Python code has improved a lot by using them :) (I have both set up on hot-keys in my editor of choice, and find them very convenient to use regularly).
Couple comments:
Agreed with Ned's remark on docstrings.
If you're using Regex in one place why not also use it to take care of the string splitting?
This is confusing:
return [ (getEpisodeTuple(chunks), [chunk for chunk in chunks if not episodeChunk(chunk)]) for chunks in [splitName(string) for string in stringiterable]]
as you're splitting the logic of the one liner into multiple parts but not in a clear way. If you think separating them is the right thing to do, then really do separate them. You're not stopping the inner lists from being generated by not breaking this up into multiple lines. Remember, loooong one liners are just as difficult if not more to decipher than loooooong methods.
Look how much clarify can be added by transforming it into:
splitNames = [splitName(string) for string in stringiterable]
chunkChunks = lambda chunks: [chunk for chunk in chunks if not episodeChunk(chunk)]
return [(getEpisodeTuple(chunks),chunkChunks(chunks) for chunks in splitNames]
#!/usr/bin/env python
should be at the first line- Use [docstrings][1] for multiline comments, and function descriptions.
- Include a general description along with the code so we don't have to guess what it is for.
- Avoid one line functions, be more descriptive with the structure.
精彩评论