I have a directory with several levels of sub-directories. All the files in the directories are html files (approx. 500 in total), and I'd like to go through each file to see if if contains a "sub_middle_1col" division. I found a great tutorial at palewire.com and have used that as my base. The two difficulties I am having are 1) the code broke when it hit a sub-directory (thinking it was a file), and 2) it would n开发者_JAVA百科ot traverse sub-directories -- that is, it only looks at files not in any sub-directory. I may have solved the first problem by adding in a line (noted below), but can't figure out how to integrate other solutions I've seen (e.g., os.walk) into the code in order to solve the second problem. Any ideas? Thanks in advance for any advice.
import os
path = "./Industries"
my_library = os.listdir(path)
out = open("out.txt", "w")
for page in my_library:
file = os.path.join(path, page)
if os.path.isfile(file) and file.endswith('.html'): #I ADDED THIS LINE
text = open(file, "r")
hit_count = 0
for line in text:
if 'sub_middle_1col' in line:
hit_count = hit_count + 1
print >> out, page + " => " + str(hit_count)
print page + " => " + str(hit_count)
text.close()
Well, you can try:
import os
for root,dirs,files in os.walk(path):
for fname in files:
if fname.endswith('.html'):
fq = os.path.join(root, fname)
for line in open(fq):
if 'sub_middle_1col' in line:
...
find() or reg. expressions (re module) to check 'sub_middle_1col' string can give you better performance...
精彩评论