I'm searching (without success) for a script, which would work as a batch file and allow me to prepend a UTF-8 text file with a BOM if it doesn't have one.
Neither the language it is written in (perl, python, c, bash) nor the OS it works on, matters to me. I have access to a wide range of computers.
开发者_StackOverflowI've found a lot of scripts to do the reverse (strip the BOM), which sounds to me as kind of silly, as many Windows program will have trouble reading UTF-8 text files if they don't have a BOM.
Did I miss the obvious?
Thanks!
The easiest way I found for this is
#!/usr/bin/env bash
#Add BOM to the new file
printf '\xEF\xBB\xBF' > with_bom.txt
# Append the content of the source file to the new file
cat source_file.txt >> with_bom.txt
I know it uses an external program (cat)... but it will do the job easily in bash
Tested on osx but should work on linux as well
NOTE that it assumes that the file doesn't already have BOM (!)
I wrote this addbom.sh using the 'file' command and ICU's 'uconv' command.
#!/bin/sh
if [ $# -eq 0 ]
then
echo usage $0 files ...
exit 1
fi
for file in "$@"
do
echo "# Processing: $file" 1>&2
if [ ! -f "$file" ]
then
echo Not a file: "$file" 1>&2
exit 1
fi
TYPE=`file - < "$file" | cut -d: -f2`
if echo "$TYPE" | grep -q '(with BOM)'
then
echo "# $file already has BOM, skipping." 1>&2
else
( mv "${file}" "${file}"~ && uconv -f utf-8 -t utf-8 --add-signature < "${file}~" > "${file}" ) || ( echo Error processing "$file" 1>&2 ; exit 1)
fi
done
edit: Added quotes around the mv
arguments. Thanks @DirkR and glad this script has been so helpful!
(Answer based on https://stackoverflow.com/a/9815107/1260896 by yingted)
To add BOMs to the all the files that start with "foo-", you can use sed
. sed
has an option to make a backup.
sed -i '1s/^\(\xef\xbb\xbf\)\?/\xef\xbb\xbf/' foo-*
If you know for sure there is no BOM already, you can simplify the command:
sed -i '1s/^/\xef\xbb\xbf/' foo-*
Make sure you need to set UTF-8, because i.e. UTF-16 is different (otherwise check How can I re-add a unicode byte order marker in linux?)
As an improvement on Yaron U.'s solution, you can do it all on a single line:
printf '\xEF\xBB\xBF' | cat - source.txt > source-with-bom.txt
The cat -
bit says to concatenate to the front of source.txt
what's being piped in from the print command. Tested on OS X and Ubuntu.
I find it pretty simple. Assuming the file is always UTF-8(you're not detecting the encoding, you know the encoding):
Read the first three characters. Compare them to the UTF-8 BOM sequence(wikipedia says it's 0xEF,0xBB,0xBF). If it's the same, print them in the new file and then copy everything else from the original file to the new file. If it's different, first print the BOM, then print the three characters and only then print everything else from the original file to the new file.
In C, fopen/fclose/fread/fwrite should be enough.
open in notepad. click save-as. under encoding, select "UTF-8(BOM)" (this is under plain "UTF-8").
I've created a script based on Steven R. Loomis's code. https://github.com/Vdragon/addUTF-8bomb
Checkout https://github.com/Vdragon/C_CPP_project_template/blob/development/Tools/convertSourceCodeToUTF-8withBOM.bash.sh for example of using this script.
in VBA Access:
Dim name As String
Dim tmpName As String
tmpName = "tmp1.txt"
name = "final.txt"
Dim file As Object
Dim finalFile As Object
Set file = CreateObject("Scripting.FileSystemObject")
Set finalFile = file.CreateTextFile(name)
'Add BOM
finalFile.Write Chr(239)
finalFile.Write Chr(187)
finalFile.Write Chr(191)
'transfer text from tmp to final file:
Dim tmpFile As Object
Set tmpFile = file.OpenTextFile(tmpName, 1)
finalFile.Write tmpFile.ReadAll
finalFile.Close
tmpFile.Close
file.DeleteFile tmpName
Here is the batch file I use for this purpose in Windows. It should be saved with ANSI (Windows-1252) encoding for the /p=
part.
@echo off
if [%~1]==[] goto usage
if not exist "%~1" goto notfound
setlocal
set /p AREYOUSURE="Adding UTF-8 BOM to '%~1'. Are you sure (Y/[N])? "
if /i "%AREYOUSURE%" neq "Y" goto canceled
:: Main code is here. Create a temp file containing the BOM, then append the requested file contents, and finally overwrite the original file
(echo|set /p=)>"%~1.temp"
type "%~1">>"%~1.temp"
move /y "%~1.temp" "%~1" >nul
@echo Added UTF-8 BOM to "%~1"
pause
exit /b 0
:usage
@echo Usage: %0 ^<FILE_NAME^>
goto end
:notfound
@echo File not found: "%~1"
goto end
:canceled
@echo Operation canceled.
goto end
:end
pause
exit /b 1
You can save the file as e.g. C:\addbom.bat
and use the following .reg
file to add it to right-click context menu of all files:
Windows Registry Editor Version 5.00
[HKEY_CLASSES_ROOT\*\Shell\Add UTF-8 BOM]
[HKEY_CLASSES_ROOT\*\Shell\Add UTF-8 BOM\command]
@="C:\\addbom.bat \"%1\""
精彩评论