开发者

Save text file UTF-8 encoded with VBA

开发者 https://www.devze.com 2022-12-24 01:57 出处:网络
how can I write UTF-8 encoded strings to a textfile from vba, like Dim fnum As Integer fnum = Fr开发者_JAVA百科eeFile

how can I write UTF-8 encoded strings to a textfile from vba, like

Dim fnum As Integer
fnum = Fr开发者_JAVA百科eeFile
Open "myfile.txt" For Output As fnum
Print #fnum, "special characters: äöüß" 'latin-1 or something by default
Close fnum

Is there some setting on Application level?


I found the answer on the web:

Dim fsT As Object
Set fsT = CreateObject("ADODB.Stream")
fsT.Type = 2 'Specify stream type - we want To save text/string data.
fsT.Charset = "utf-8" 'Specify charset For the source text data.
fsT.Open 'Open the stream And write binary data To the object
fsT.WriteText "special characters: äöüß"
fsT.SaveToFile sFileName, 2 'Save binary data To disk

Certainly not as I expected...


You can use CreateTextFile or OpenTextFile method, both have an attribute "unicode" useful for encoding settings.

object.CreateTextFile(filename[, overwrite[, unicode]])        
object.OpenTextFile(filename[, iomode[, create[, format]]])

Example: Overwrite:

CreateTextFile:
 fileName = "filename"
 Set fso = CreateObject("Scripting.FileSystemObject")
 Set out = fso.CreateTextFile(fileName, True, True)
 out.WriteLine ("Hello world!")
 ...
 out.close

Example: Append:

 OpenTextFile Set fso = CreateObject("Scripting.FileSystemObject")
 Set out = fso.OpenTextFile("filename", ForAppending, True, 1)
 out.Write "Hello world!"
 ...
 out.Close

See more on MSDN docs


This writes a Byte Order Mark at the start of the file, which is unnecessary in a UTF-8 file and some applications (in my case, SAP) don't like it. Solution here: Can I export excel data with UTF-8 without BOM?


Here is another way to do this - using the API function WideCharToMultiByte:

Option Explicit

Private Declare Function WideCharToMultiByte Lib "kernel32.dll" ( _
  ByVal CodePage As Long, _
  ByVal dwFlags As Long, _
  ByVal lpWideCharStr As Long, _
  ByVal cchWideChar As Long, _
  ByVal lpMultiByteStr As Long, _
  ByVal cbMultiByte As Long, _
  ByVal lpDefaultChar As Long, _
  ByVal lpUsedDefaultChar As Long) As Long

Private Sub getUtf8(ByRef s As String, ByRef b() As Byte)
Const CP_UTF8 As Long = 65001
Dim len_s As Long
Dim ptr_s As Long
Dim size As Long
  Erase b
  len_s = Len(s)
  If len_s = 0 Then _
    Err.Raise 30030, , "Len(WideChars) = 0"
  ptr_s = StrPtr(s)
  size = WideCharToMultiByte(CP_UTF8, 0, ptr_s, len_s, 0, 0, 0, 0)
  If size = 0 Then _
    Err.Raise 30030, , "WideCharToMultiByte() = 0"
  ReDim b(0 To size - 1)
  If WideCharToMultiByte(CP_UTF8, 0, ptr_s, len_s, VarPtr(b(0)), size, 0, 0) = 0 Then _
    Err.Raise 30030, , "WideCharToMultiByte(" & Format$(size) & ") = 0"
End Sub

Public Sub writeUtf()
Dim file As Integer
Dim s As String
Dim b() As Byte
  s = "äöüßµ@€|~{}[]²³\ .." & _
    " OMEGA" & ChrW$(937) & ", SIGMA" & ChrW$(931) & _
    ", alpha" & ChrW$(945) & ", beta" & ChrW$(946) & ", pi" & ChrW$(960) & vbCrLf
  file = FreeFile
  Open "C:\Temp\TestUtf8.txt" For Binary Access Write Lock Read Write As #file
  getUtf8 s, b
  Put #file, , b
  Close #file
End Sub


I looked into the answer from Máťa whose name hints at encoding qualifications and experience. The VBA docs say CreateTextFile(filename, [overwrite [, unicode]]) creates a file "as a Unicode or ASCII file. The value is True if the file is created as a Unicode file; False if it's created as an ASCII file. If omitted, an ASCII file is assumed." It's fine that a file stores unicode characters, but in what encoding? Unencoded unicode can't be represented in a file.

The VBA doc page for OpenTextFile(filename[, iomode[, create[, format]]]) offers a third option for the format:

  • TriStateDefault 2 "opens the file using the system default."
  • TriStateTrue 1 "opens the file as Unicode."
  • TriStateFalse 0 "opens the file as ASCII."

Máťa passes -1 for this argument.

Judging from VB.NET documentation (not VBA but I think reflects realities about how underlying Windows OS represents unicode strings and echoes up into MS Office, I don't know) the system default is an encoding using 1 byte/unicode character using an ANSI code page for the locale. UnicodeEncoding is UTF-16. The docs also describe UTF-8 is also a "Unicode encoding," which makes sense to me. But I don't yet know how to specify UTF-8 for VBA output nor be confident that the data I write to disk with the OpenTextFile(,,,1) is UTF-16 encoded. Tamalek's post is helpful.


I didn't want to change all my code just to support several UTF8 strings so i let my code do it's thing, and after the file was saved (in ANSI code as it is the default of excel) i then convert the file to UTF-8 using this code:

Sub convertTxttoUTF(sInFilePath As String, sOutFilePath As String)
    Dim objFS  As Object
    Dim iFile       As Double
    Dim sFileData   As String
    
    'Init
    iFile = FreeFile
    Open sInFilePath For Input As #iFile
        sFileData = Input$(LOF(iFile), iFile)
        sFileData = sFileData & vbCrLf
    Close iFile
    
    'Open & Write
    Set objFS = CreateObject("ADODB.Stream")
    objFS.Charset = "utf-8"
    objFS.Open
    objFS.WriteText sFileData
    
    'Save & Close
    objFS.SaveToFile sOutFilePath, 2   '2: Create Or Update
    objFS.Close
    
    'Completed
    Application.StatusBar = "Completed"
End Sub

and i use this sub like this (this is an example):

Call convertTxttoUTF("c:\my.json", "c:\my-UTF8.json")

i found this code here: VBA to Change File Encoding ANSI to UTF8 – Text to Unicode

and since this is written with BOM marker, in order to remove the bom i changed the Sub to this:

Sub convertTxttoUTF(sInFilePath As String, sOutFilePath As String)
    Dim objStreamUTF8  As Object
    Dim objStreamUTF8NoBOM  As Object
    Dim iFile       As Double
    Dim sFileData   As String
    
    Const adSaveCreateOverWrite = 2
    Const adTypeBinary = 1
    Const adTypeText = 2
    
    'Init
    iFile = FreeFile
    Open sInFilePath For Input As #iFile
        sFileData = Input(LOF(iFile), iFile)
    Close iFile
    
    'Open files
    Set objStreamUTF8 = CreateObject("ADODB.Stream")
    Set objStreamUTF8NoBOM = CreateObject("ADODB.Stream")
           
    ' wrute the fules       
    With objStreamUTF8
      .Charset = "UTF-8"
      .Open
      .WriteText sFileData
      .Position = 0
      .SaveToFile sOutFilePath, adSaveCreateOverWrite
      .Type = adTypeText
      .Position = 3
    End With
    
    With objStreamUTF8NoBOM
      .Type = adTypeBinary
      .Open
      objStreamUTF8.CopyTo objStreamUTF8NoBOM
      .SaveToFile sOutFilePath, 2
    End With
    
    ' close the files
    objStreamUTF8.Close
    objStreamUTF8NoBOM.Close
End Sub

i used this answer to solve the BOM unknown character at the beginning of the file


The traditional way to transform a string to a UTF-8 string is as follows:

StrConv("hello world",vbFromUnicode)

So put simply:

Dim fnum As Integer
fnum = FreeFile
Open "myfile.txt" For Output As fnum
Print #fnum, StrConv("special characters: äöüß", vbFromUnicode)
Close fnum

No special COM objects required

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号