开发者

How To Create And Write UTF-16 Text File Using Applescript?

开发者 https://www.devze.com 2023-02-10 08:08 出处:网络
I\'m writing an Applescript to parse an iOS Localization file (/en.lproj/Localizable.strings), translate the values and output the translation (/fr.lproj/Localizable.strings) to disk in UTF-16 (Unicod

I'm writing an Applescript to parse an iOS Localization file (/en.lproj/Localizable.strings), translate the values and output the translation (/fr.lproj/Localizable.strings) to disk in UTF-16 (Unicode) encoding.

For some reason, the generated file has an extra space between every letter. After some digging, I found the cause of the problem in Learn AppleScript: The Comprehensive Guide to Scripting.

"If you accidently read a UTF-16 file as MacRoman, the resulting value may look at first glance like an ordinary string, especially if it contains English text. You'll quickly discover that something is very wrong when you try to use it, however: a common symptom is that each visible character in your "string" seems to have an invisible character in front of it. For example, reading a UTF-16 encoded text file containing the phrase "Hello World!" as a string produces a string like " H e l l o W o r l d ! ", where each " " is really an invisible ASCII 0 character."

So for example my English localization string file has:

"Yes" = "Yes";

And the generated French localization string file has:

 " Y e s "  =  " O u i " ;

Here is my createFile method:

on createFile(fileFolder, fileName)
    tell application "Finder"
        if (exists file fileName of folder fileFolder) then
            set the fileAccess to open for access file fileName of folder fileFolder with write permission
            set eof of fileAccess to 0
            write ((ASCII character 254) & (ASCII character 255)) to fileAccess starting at 0
            --write «data rdatFEFF» to fileAccess starting at 0
            close access the fileAccess
        else
            set the filePath to make new file at fileFolder with properties {name:fileName}
            set the fileAccess to open for access file fileName of folder fileFolder with write permission
            write ((ASCII character 254) & (ASCII character 255)) to fileAccess starting at 0
            --write «da开发者_C百科ta rdatFEFF» to fileAccess starting at 0
            close access the fileAccess
        end if
        return file fileName of folder fileFolder as text
    end tell
end createFile

And here is my writeFile method:

on writeFile(filePath, newLine)
    tell application "Finder"
        try
            set targetFileAccess to open for access file filePath with write permission
            write newLine to targetFileAccess as Unicode text starting at eof
            close access the targetFileAccess
            return true
        on error
            try
                close access file filePath
            end try
            return false
        end try
    end tell
end writeFile

Any idea what I'm doing wrong?


Here's the handlers I use to read and write as UTF16. You don't need a separate "create file" handler. The write handler will create the file if it doesn't exist. Set the "appendText" variable to true or false. False means overwrite the file and true means add the new text to the end of the current text in the file. I hope this helps.

on writeTo_UTF16(targetFile, theText, appendText)
    try
        set targetFile to targetFile as text
        set openFile to open for access file targetFile with write permission
        if appendText is false then
            set eof of openFile to 0
            write (ASCII character 254) & (ASCII character 255) to openFile starting at eof -- UTF-16 BOM
        else
            tell application "Finder" to set fileExists to exists file targetFile
            if fileExists is false then
                set eof of openFile to 0
                write (ASCII character 254) & (ASCII character 255) to openFile starting at eof -- UTF-16 BOM
            end if
        end if
        write theText to openFile starting at eof as Unicode text
        close access openFile
        return true
    on error theError
        try
            close access file targetFile
        end try
        return theError
    end try
end writeTo_UTF16

on readFrom_UTF16(targetFile)
    try
        set targetFile to targetFile as text
        targetFile as alias -- if file doesn't exist then you get an error
        set openFile to open for access file targetFile
        set theText to read openFile as Unicode text
        close access openFile
        return theText
    on error
        try
            close access file targetFile
        end try
        return false
    end try
end readFrom_UTF16


If you're getting actual spaces between every character, you've probably got the '(characters i thru j of someText) as string' anti-pattern in your code [1]. That will split a string into a list of characters, then coerce it back into a string with your current text item delimiter inserted between each character. The correct (i.e. fast and safe) way to get a sub-string is this: 'text i thru j of someText' (p179-181).

OTOH, if you are getting invisible characters between each character [2], then yes, that'll be an encoding issue, typically reading a UTF16-encoded file using MacRoman or other single-byte encoding. If your file has a valid Byte Order Mark then any Unicode-savvy text editor should read it using the correct encoding.


[1] p179 states that this idiom is unsafe, but forgets to provide a practical demonstration of the problems it causes. [3]

[2] IIRC the example on p501 was meant to use rectangle symbols to represent invisible characters, i.e. "⃞H⃞e⃞l⃞l⃞o" not " H e l l o", but didn't come out quite that way so might be misread as meaning visible spaces. [3]

[3] Feel free to submit errata to Apress.

0

精彩评论

暂无评论...
验证码 换一张
取 消