BATCH FILE to remove duplicate strings (containing Double Quotes); and keep blank lines
Note: The Final Output must have original strings with Double Quotes and Blank lines.
I have been working on this for a long time and I can not fine a solution, thanks in advance for your assistance. When I get the remove duplicates working something else doesn't... I know that it looks like I haven't done much work but I have trimmed this down for clarity.
@echo on
REM -- Prepare the Command Processor --
SETLOCAL ENABLEEXTENSIONS
SETLOCAL EnABLEDELAYEDEXPANSION
REM -- Prepare the Prompt for easy debugging -- restore with prompt=$p$g
prompt=$g
rem The finished program will remove duplicates lines
:START
set "_duplicates=TRUE"
set "_infile=copybuffer.txt"
set "_oldstr=the"
set "_newstr=and"
call :BATCHSUBSTITUTE %_infile% %_oldstr% %_newstr%
pause
goto :SHOWINTELL
goto :eof
:BATCHSUBSTITUTE
type nul> %TEMP%.\TEMP.DAT
if "%~2"=="" findstr "^::" "%~f0"&GOTO:EOF
for /f "tokens=1,* delims=]" %%A in ('"type %1|find /n /v """') do (
set "_line=%%B"
if defined _line (
if "%_duplicates%"=="TRUE" (
set "_unconverted=!_line!"
set "_converted=!_line:"=""!"
FIND "!_converted!" %TEMP%.\TEMP.DAT > nul
if errorlevel==1 (
>> %TEMP%.\TEMP.DAT echo !_unconverted!
)
)
) ELSE (
echo(>> %TEMP%.\TEMP.DAT
)
)
goto :eof
:SHOWINTELL
@echo A|move %TEMP%.\TEMP.DAT doubleFree.txt
start doubleFree.txt
goto :eof
Input: copybuffer.txt
this test 'data' may have a path C:\Users\Documents\30% full.txt
this test 'data' may have a path C:\Users\Documents\30% full.txt
this test 'data' may have duplicates
this test 'data' may have duplicates
this test 'data' may drive "YOU NUTS"
this test 'data' may drive "YOU NUTS"
this test 'data' may drive "YOU NUTS"
this test 'data' may drive "YOU NUTS"
this test 'data' may drive "YOU NUTS"
this test 'data' may drive "YOU NUTS"
this test 'data' may have Blank Lines
this test 'data' may have Blank Lines
this test 'data' may have "Double Quoted text" in the middle of the string
this test 'data' may have "Double Quoted text" in and middle of and string
this test 'data' may have "Trouble with the find" command
this test 'data' may have "Trouble with and find" command
this test 'data' may drive "YOU NUTS"
this test 'data' may drive "YOU NUTS"
Actual Output: doubleFree.txt (Note: last two lines are NOT duplicates)
this test 'data' may have a path C:\Users\Documents\30% full.txt
this test 'data' may have duplicates
this test 'data' may drive "YOU NUTS"
this test 'data' may have Blank Lines
this test 'data' may have "Double Quoted text" in the middle of the string
this test 'data' may have "Double Quoted text" in and middle of and string
t开发者_如何学运维his test 'data' may have "Trouble with the find" command
this test 'data' may have "Trouble with and find" command
the main problem seems to be the expansion of special characters, like quotation marks.
You could avoid this by using the delayed expansion, so special characters are ignored. (Not perfect here, but nearly, there are only problems with exclamation marks and carets)
The next problem is to search strings with quotation marks with the find command. You have to double them.
@echo off
REM -- Prepare the Command Processor --
SETLOCAL ENABLEEXTENSIONS
SETLOCAL EnABLEDELAYEDEXPANSION
REM -- Prepare the Prompt for easy debugging -- restore with prompt=$p$g
prompt=$g
rem The finished program will remove duplicates lines
:START
set "_duplicates=TRUE"
set "_infile=copybuffer.txt"
set "_oldstr=the"
set "_newstr=and"
call :BATCHSUBSTITUTE %_infile% %_oldstr% %_newstr%
pause
goto :SHOWINTELL
goto :eof
:BATCHSUBSTITUTE
type nul> %TEMP%.\TEMP.DAT
type nul> %TEMP%.\TEMP2.DAT
if "%~2"=="" findstr "^::" "%~f0"&GOTO:EOF
for /f "tokens=1,* delims=]" %%A in ('"type %1|find /n /v """') do (
set "_line=%%B"
if defined _line (
if "%_duplicates%"=="TRUE" (
set "_unconverted=!_line!"
set "_converted=!_line:"=""!"
FIND "!_converted!" %TEMP%.\TEMP.DAT > nul
if errorlevel==1 (
>> %TEMP%.\TEMP.DAT echo !_unconverted!
)
call set "_converted=%%_line:"=#%%"
)
) ELSE (
echo(>> %TEMP%.\TEMP.DAT
)
)
goto :eof
:SHOWINTELL
@echo A|move %TEMP%.\TEMP.DAT doubleFree.txt
start doubleFree.txt
goto :eof
Use a good tool for file processing. If you have the luxury to download stuff, you can try gawk for windows .
C:\test> gawk "!a[$0]++ && $0~/\042|\047/|| !NF" file
this test 'data' may have a path C:\Users\Documents\30% full.txt
this test 'data' may have duplicates
this test 'data' may drive "YOU NUTS"
this test 'data' may have Blank Lines
this test 'data' may have "Double Quoted text" in the middle of the string
this test 'data' may have "Double Quoted text" in and middle of and string
this test 'data' may have "Trouble with the find" command
this test 'data' may have "Trouble with and find" command
this test 'data' may drive "YOU NUTS"
If not, native languages like vbscript is still better at batch to do such stuff.
strFile= WScript.Arguments(0)
Set objFS = CreateObject( "Scripting.FileSystemObject" )
Set d = CreateObject("Scripting.Dictionary")
Set objFile = objFS.OpenTextFile(strFile)
Do Until objFile.AtEndOfStream
strLine=objFile.ReadLine
If Not d.Exists(strLine) Then
d.Add strLine, 1
End If
Loop
objFile.Close
For Each strkey In d.Keys
WScript.Echo strkey ', d.Item(strkey)
Next
Usage:
C:\test>cscript //nologo myscript.vbs file
精彩评论