开发者

Find a string and replace it more effeciently

开发者 https://www.devze.com 2023-01-06 09:23 出处:网络
Situation: I have a html file and I need to remove certain sections. For Example: The file contains html:<div style=\"padding:10px;\">First Name:</div><div style=\"padding:10px; backgr

Situation: I have a html file and I need to remove certain sections.

For Example: The file contains html: <div style="padding:10px;">First Name:</div><div style="padding:10px; background-color: gray">random information here</div><div style="padding:10px;">First Name:</div><div style="padding:10px; background-color: gray">random information here</div>

I need to remove all text that starts with "<div style="padding:10px; background-color: gray">" and ends with "</div>" so that the result would be:

<div style="padding:10px;">First Name:</div><div style="padding:10px;">First Name:</div>

I created 2 functions that do this, but I do not this it efficient at all. I have a 40mb file and it takes the program about 2 hours to complete. Is there a more efficient way to do this? Is there a way to use regex?

See my code below:

Public Shared Function String_RemoveText(ByVal startAt As String, ByVal endAt As String, ByVal SourceString As String) As String
    Dim TotalCount As Integer = String_CountCharacters(SourceString, startAt)
    Dim CurrentCount As Integer = 0

RemoveNextString:

    Dim LeftRemoved As String = Mid(SourceString, InStr(SourceString, startAt) + 1, Len(SourceString) - Len(endAt))
    Dim RemoveCore As String = Left(LeftRemoved, InStr(LeftRemoved, endAt) - 1)
    Dim RemoveString As String = startAt & RemoveCore & endAt


    Do
        '    Application.DoEvents()
        SourceString = Replace(SourceString, RemoveString, "")
        If InStr(SourceString, startAt) < 1 Then Exit Do
        GoTo RemoveNextString
    Loop

    Return Replace(SourceString, RemoveString, "")

End Function

Public Shared Sub Files_ReplaceText(ByVal DirectoryPath As String, ByVal SourceFile As String, ByVal DestinationFile As String, ByVal sFind As String, ByVal sReplace As String, ByVal TrimContents As Boolean, ByVal RemoveCharacters As Boolean, ByVal rStart As String, ByVal rEnd As String)

    'CREATE NEW FILENAME
    Dim DateFileName As String = Date.Now.ToString.Replace(":", "_")
    DateFileName = DateFileName.Replace(" ", "_")
    DateFileName = DateFileName.Replace("/", "_")
    Dim FileExtension As String = ".txt"
    Dim NewFileName As String = DirectoryPath & DateFileName & FileExtension
    'CHECK IF FILENAME ALREADY EXISTS
    Dim counter As Integer = 0
    If IO.File.Exists(NewFileName) = True Then
        'CREATE NEW FILE NAME
        Do
            'Application.DoEvents()
            counter = counter + 1
            If IO.File.Exists(DirectoryPath & DateFileName & "_" & counter & FileExtension) = False Then
                NewFileName = DirectoryPath & DateFileName & "_" & counter & FileExtension
                Exit Do
            End If
        Loop
    End If
    'END NEW FILENAME

    'READ SOURCE FILE
    Dim sr As New StreamReader(DirectoryPath & SourceFile)
    Dim content As String = sr.ReadToEnd()
    sr.Close()

    'WRITE NEW FILE
    Dim sw As New StreamWriter(NewFileName)

    'REPLACE VALUES
    content = content.Replace(sFind, sReplace)

    'REMOVE STRINGS
    If RemoveCharacters = True Then content = String_RemoveText(rStart, rEnd, content)


    'TRIM
    If TrimContents = True Then content = Regex.Replace(content, "[\t]", "")

    'WRITE FILE
    sw.Write(content)

    'CLOSE FILE
    sw.Close()
End Sub
开发者_开发百科

Example to execute the code (also removes Chr(13) & Chr(10): Files_ReplaceText(tPath.Text, tSource.Text, "", Chr(13) & Chr(10), "", True, True, tStart.Text, tEnd.Text)


Do not use a RegEx to parse HTML - it is not a regular language. See here for some compelling demonstrations.

Use the HTML Agility Pack to parse the HTML and replace data.

0

精彩评论

暂无评论...
验证码 换一张
取 消