开发者

BATCH FILE to remove duplicate strings containing Double Quotes; and keep blank lines

开发者 https://www.devze.com 2023-01-19 05:11 出处:网络
BATCH FILE to remove duplicate strings (containing Double Quotes); and keep blank lines Note: The Final Output must have original strings with Double Quotes and Blank lines.

BATCH FILE to remove duplicate strings (containing Double Quotes); and keep blank lines

Note: The Final Output must have original strings with Double Quotes and Blank lines.

I have been working on this for a long time and I can not fine a solution, thanks in advance for your assistance. When I get the remove duplicates working something else doesn't... I know that it looks like I haven't done much work but I have trimmed this down for clarity.


@echo on
REM -- Prepare the Command Processor --
SETLOCAL ENABLEEXTENSIONS
SETLOCAL EnABLEDELAYEDEXPANSION

REM -- Prepare the Prompt for easy debugging -- restore with prompt=$p$g
prompt=$g

rem The finished program will remove duplicates lines

:START
set "_duplicates=TRUE"

set "_infile=copybuffer.txt"
set                        "_oldstr=the"
set                                    "_newstr=and"

call :BATCHSUBSTITUTE %_infile% %_oldstr% %_newstr% 
pause
goto :SHOWINTELL
goto :eof


:BATCHSUBSTITUTE

type nul> %TEMP%.\TEMP.DAT

if "%~2"=="" findstr "^::" "%~f0"&GOTO:EOF
for /f "tokens=1,* delims=]" %%A in ('"type %1|find /n /v """') do (
    set "_line=%%B"
    if defined _line (
        if "%_duplicates%"=="TRUE" (
            set "_unconverted=!_line!"
            set "_converted=!_line:"=""!"
            FIND "!_converted!" %TEMP%.\TEMP.DAT > nul
            if errorlevel==1 (
                >> %TEMP%.\TEMP.DAT echo !_unconverted!
            )
        ) 
    ) ELSE (
        echo(>> %TEMP%.\TEMP.DAT
    )
)
goto :eof


:SHOWINTELL
@echo A|move %TEMP%.\TEMP.DAT doubleFree.txt
start doubleFree.txt
goto :eof

Input: copybuffer.txt

this test 'data' may have a path C:\Users\Documents\30% full.txt 
this test 'data' may have a path C:\Users\Documents\30% full.txt 
this test 'data' may have duplicates 
this test 'data' may have duplicates 


this test 'data' may drive "YOU NUTS" 
this test 'data' may drive "YOU NUTS" 
this test 'data' may drive "YOU NUTS" 
this test 'data' may drive "YOU NUTS" 
this test 'data' may drive "YOU NUTS" 
this test 'data' may drive "YOU NUTS" 
this test 'data' may have Blank Lines 
this test 'data' may have Blank Lines 
this test 'data' may have "Double Quoted text" in the middle of the string 
this test 'data' may have "Double Quoted text" in and middle of and string 
this test 'data' may have "Trouble with the find" command 
this test 'data' may have "Trouble with and find" command 
this test 'data' may drive "YOU NUTS" 
this test 'data' may drive "YOU NUTS"

Actual Output: doubleFree.txt (Note: last two lines are NOT duplicates)

this test 'data' may have a path C:\Users\Documents\30% full.txt 
this test 'data' may have duplicates 


this test 'data' may drive "YOU NUTS" 
this test 'data' may have Blank Lines 
this test 'data' may have "Double Quoted text" in the middle of the string 
this test 'data' may have "Double Quoted text" in and middle of and string 
t开发者_如何学运维his test 'data' may have "Trouble with the find" command 
this test 'data' may have "Trouble with and find" command 


the main problem seems to be the expansion of special characters, like quotation marks.

You could avoid this by using the delayed expansion, so special characters are ignored. (Not perfect here, but nearly, there are only problems with exclamation marks and carets)

The next problem is to search strings with quotation marks with the find command. You have to double them.

@echo off
REM -- Prepare the Command Processor --
SETLOCAL ENABLEEXTENSIONS
SETLOCAL EnABLEDELAYEDEXPANSION

REM -- Prepare the Prompt for easy debugging -- restore with prompt=$p$g
prompt=$g

rem The finished program will remove duplicates lines

:START
set "_duplicates=TRUE"

set "_infile=copybuffer.txt"
set                        "_oldstr=the"
set                                    "_newstr=and"

call :BATCHSUBSTITUTE %_infile% %_oldstr% %_newstr% 
pause
goto :SHOWINTELL
goto :eof


:BATCHSUBSTITUTE

type nul> %TEMP%.\TEMP.DAT
type nul> %TEMP%.\TEMP2.DAT

if "%~2"=="" findstr "^::" "%~f0"&GOTO:EOF
for /f "tokens=1,* delims=]" %%A in ('"type %1|find /n /v """') do (
    set "_line=%%B"
    if defined _line (
        if "%_duplicates%"=="TRUE" (
            set "_unconverted=!_line!"
            set "_converted=!_line:"=""!"

            FIND "!_converted!" %TEMP%.\TEMP.DAT > nul
            if errorlevel==1 (
                >> %TEMP%.\TEMP.DAT echo !_unconverted!
            )
            call set "_converted=%%_line:"=#%%"
        ) 
    ) ELSE (
        echo(>> %TEMP%.\TEMP.DAT
    )
)
goto :eof


:SHOWINTELL
@echo A|move %TEMP%.\TEMP.DAT doubleFree.txt
start doubleFree.txt
goto :eof


Use a good tool for file processing. If you have the luxury to download stuff, you can try gawk for windows .

C:\test> gawk "!a[$0]++ && $0~/\042|\047/|| !NF" file
this test 'data' may have a path C:\Users\Documents\30% full.txt
this test 'data' may have duplicates


this test 'data' may drive "YOU NUTS"
this test 'data' may have Blank Lines
this test 'data' may have "Double Quoted text" in the middle of the string
this test 'data' may have "Double Quoted text" in and middle of and string
this test 'data' may have "Trouble with the find" command
this test 'data' may have "Trouble with and find" command
this test 'data' may drive "YOU NUTS"

If not, native languages like vbscript is still better at batch to do such stuff.

strFile= WScript.Arguments(0)
Set objFS = CreateObject( "Scripting.FileSystemObject" )
Set d = CreateObject("Scripting.Dictionary")
Set objFile = objFS.OpenTextFile(strFile)
Do Until objFile.AtEndOfStream
    strLine=objFile.ReadLine    
    If Not d.Exists(strLine) Then
        d.Add strLine, 1
    End If 
Loop
objFile.Close
For Each strkey In d.Keys       
    WScript.Echo strkey ',  d.Item(strkey) 
Next

Usage:

C:\test>cscript //nologo myscript.vbs file
0

精彩评论

暂无评论...
验证码 换一张
取 消