How can I determine if it's ok to change this word... python script_问答_开发者

How can I determine if it's ok to change this word... python script

开发者 https://www.devze.com 2023-01-21 22:44 出处：网络

The goal is to read through html files and change all instances of MyWord to Myword; except, must NOT change the word if it is found inside or as part of a path, file name or script:

相关专题：file python

The goal is to read through html files and change all instances of MyWord to Myword; except, must NOT change the word if it is found inside or as part of a path, file name or script:

href="..."
src="..."
url(...)
class="..."
id="..."
script inline or linked (file name) --> <script ...></script>
styles inline or linked (file name) --&g开发者_StackOverflow社区t; <link ...>   <style></style>

Now the question of all questions: how do you determine if the instance of the word is in a position where it's ok to change it? (or, how do you determine if the word is inside of one of the above listed locations and should not be changed?)

Here is my code, it can be changed to read line by line, etc. but I just can not think of how to define and enforce a rule to match above...

Here it is:

#!/usr/bin/python

import os
import time
from stat import *

def fileExtension(s):
   i = s.rfind('.')
   if i == -1:
      return ''
   tmp = '|' + s[i+1:] + '|'
   return tmp

def changeFiles():
   # get all files in current directory with desired extension
   files = [f for f in os.listdir('.') if extStr.find(fileExtension(f)) != -1]

   for f in files:
      if os.path.isdir(f):
         continue

      st = os.stat(f)
      atime = st[ST_ATIME] # org access time
      mtime = st[ST_MTIME] # org modification time

      fw = open(f, 'r+')
      tmp = fw.read().replace(oldStr, newStr)
      fw.seek(0)
      fw.write(tmp)
      fw.close()

      # put file timestamp back to org timestamp
      os.utime(f,(atime,mtime))

      # if we want to check subdirectories
      if checkSubDirs :
         dirs = [d for d in os.listdir('.') if os.path.isdir(d)]

      for d in dirs :
         os.chdir(d)
         changeFiles()
         os.chdir('..')

# ==============================================================================
# ==================================== MAIN ====================================

oldStr = 'MyWord'
newStr = 'Myword'
extStr = '|html|htm|'
checkSubDirs = True

changeFiles()

Anybody know how? Have any suggestions? ANY help is appreciated, beating my brain for 2 days now and just can not think of anything.

lxml helps with this kind of task.

html = """
<html>
<body>
    <h1>MyWord</h1>
    <a href="http://MyWord">MyWord</a>
    <img src="images/MyWord.png"/>
    <div class="MyWord">
        <p>MyWord!</p>
        MyWord
    </div>
    MyWord
</body><!-- MyWord -->
</html>
"""

import lxml.etree as etree

tree = etree.fromstring(html)
for elem in tree.iter():
    if elem.text:
        elem.text = re.sub(r'MyWord', 'Myword', elem.text)
    if elem.tail:
        elem.tail = re.sub(r'MyWord', 'Myword', elem.tail)

print etree.tostring(tree)

The above prints this:

<html>
<body>
    <h1>Myword</h1>
    <a href="http://MyWord">Myword</a>
    <img src="images/MyWord.png"/>
    <div class="MyWord">
        <p>Myword!</p>
        Myword
    </div>
    Myword
</body><!-- Myword -->
</html>

Note: You'll need to make the above code a little more complex if you also need special processing for the contents of script tags, such as the following

<script>
    var title = "MyWord"; // this should change to "Myword"
    var hoverImage = "images/MyWord-hover.png"; // this should not change
</script>

Use regex here is an example that you can start with, hope this will help :

import re

html = """
    <href="MyWord" />
    MyWord
"""

re.sub(r'(?<!href=")MyWord', 'myword', html)
output: \n\n <href="MyWord" />\n myword\n\n

ref : http://docs.python.org/library/re.html

How can I determine if it's ok to change this word... python script

精彩评论

关注公众号

热门标签

图文推荐

How can I determine if it's ok to change this word... python script

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：