开发者

Generate html from plain text with formatting markers in Python 3

开发者 https://www.devze.com 2023-03-19 16:17 出处:网络
I have written a set of Python 3 scripts to take a formatted text file and move the data into a SQLite database.The data in the database is then used as a part of a PHP application.The data in my text

I have written a set of Python 3 scripts to take a formatted text file and move the data into a SQLite database. The data in the database is then used as a part of a PHP application. The data in my text file has formatting markers for bold and italics, but not in anything intelligible to a browser. The formatting scheme is like this开发者_如何学Go:

fi:xxxx        (italics on the word xxxx (turned off at the word break))
fi:{xxx…xxx}   (italics on the word or phrase in the curly brackets {})
fb:xxxx        (bold on the word xxxx (turned off at the word break))
fb:{xxx}       (bold on the word or phrase in the brackets {})
fv:xxxx        (bold on the word xxxx (turned off at the word break))
fv:{xxx…xxx}   (bold on the word or phrase in the brackets {})
fn:{xxx…xxx}   (no formatting)

I would like to convert each line of source text to (1. a line containing the string, using html tags instead of the source formatting and 2. another line, containing the string stripped of all formatting markers). I need a formatted and a stripped line for each source line, even if no formatting markers are used on that line. In the source data, multiple formatting markers of different (or the same) sort may show up in a single line, but you won't find any marker that doesn't end before the line does.


To format the bracketed sections, you could do something like this:

while text.find(":{") > -1:
    index = text.find(":{")
    if text[index-2:index]=="fb":
        text = text[:index-2] + "<b>" + text[index+2:] #insert <b>
        text = text.replace("}","</b>",1) # replace one.
    # else if fi, fv, etc.

This will convert "other fb:{bold text} text" to "other bold text text".

Then you could convert the space-separated sections:

array = text.split(" ")
for word in array:
    if (word.startswith("fi")):
        word = "<i>"+word[2:]+"</i>"
    else if (word.startswith("fb")):
        ....
text = " ".join(array)

If you want plain text just replace the tags such as "<b>" and "</b>" with empty string "".

If the formatting doesn't span multiple lines you will probably get better performance reading and converting line by line with:

inFile = open("file.txt","r")
outFile = open("file.out","w")

def convert(text):
    #Change text here.
    return text

for line in inFile:
    outFile.write(convert(line))
0

精彩评论

暂无评论...
验证码 换一张
取 消