开发者

Shortening a repeating sequence in a string

开发者 https://www.devze.com 2023-01-05 09:20 出处:网络
I have built a blog platform in VB.NET where the audience are very young, and for some reason like to express their commitment by repeating sequences of characters in their comments.

I have built a blog platform in VB.NET where the audience are very young, and for some reason like to express their commitment by repeating sequences of characters in their comments.

Examples:

Hi!!!!!!!!!!!!!开发者_开发知识库!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! <3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3 LOLOLOLOLOLOLOLOLOLOLOLOLLOLOLOLOLOLOLOLOLOLOLOLOL

..and so on.

I don't want to filter this out completely, however, I would like to shorten it down to a maximum of 5 repeating characters or sequences in a row. I have no problem writing a function to handle a single repeating character. But what is the most effective way to filter out a repeating sequence as well?

This is what I used earlier for the single repeating characters

Private Shared Function RemoveSequence(ByVal str As String) As String
    Dim sb As New System.Text.StringBuilder
    sb.Capacity = str.Length
    Dim c As Char
    Dim prev As Char = String.Empty
    Dim prevCount As Integer = 0

    For i As Integer = 0 To str.Length - 1
        c = str(i)
        If c = prev Then
            If prevCount < 10 Then
                sb.Append(c)
            End If
            prevCount += 1
        Else
            sb.Append(c)
            prevCount = 0
        End If
        prev = c
    Next

    Return sb.ToString
End Function

Any help would be greatly appreciated


You should be able to recursively use the 'Longest repeated substring problem' to solve this. On the first pass you will get two matching sub-strings, and will need to check if they are contiguous. Then repeat the step for one of the sub-strings. Cut off the algo, if the strings are not contiguous, or if the string size become less than a certain number of characters. Finally, you should be able to keep the last match, and discard the rest. You will need to dig around for an implementation :(

Also have a look at this previously asked question: finding long repeated substrings in a massive string

0

精彩评论

暂无评论...
验证码 换一张
取 消