How can you find the repetiting sequences of at least 30 numbers?
Sample of the data
2.37585421412300683371298405466970387243735763097949886104783599088838268792710706150341685649202733485193621867881548974943052391799544419134396355353075170842824601366742596810933940774487471526195899772209567198177676537585421412300683371298405466970387243735763097949886104783599088838268792710706150341685649202733485193621867881548974943052391799544419134396355353075170842824601366742596810933940774487471526195899772209567198177676537585421412300683371298405466970387243735763097949886104783599088838268792710706150341685649202733485193621867881548974943052391799544419134396355353075170842824601366742596810933940774487471526195899772209567198177676537585421412300683371298405466970387243735763097949886104783599088838268792710706150341685649202733485193621867881548974943052391799544419134396355353075170842824601366742596810933940774487471526195899772209567198177676537585421412300683371298405466970387243735763097949886104783599088838268792710706150341685649202733485193621867881548974943052391799544419134396355353075170842824601366742596810933940774487471526195899772209567198177676537585421412300683371298405466970387243735763097949886104783599088838268792710706150341685649202733485193621867881548974943052391799544419134396355353075170842824601366742596810933940774487471526195899772209567198177676537585421412300683371298405466970387243735763097949886104783599088838268792710706150341685649202733485193621867881548974943052391799544419134396355353075170842824601366742596810933940774487471526195899772209567198177676537585421412300683371298405466970387243735763097949886104783599088838268792710706150341685649202733485193621867881548974943052391799544419134396355353075170842824601366742596810933940774487471526195899772209567198177676537585421412300683371298405466970387243735763097949886104783599088838268792710706150341685649202733485193621867881548974943052391799544419134396355353075170842824601开发者_如何学编程36674259681093394077448747152619589977220956719817767653758542141230068337129840547
My attempt in Vim
:g/\(\d\{4}\)\[^\1\]\1/
|
|----------- Problem here!
I do not know how you can have the negation of the first glob.
First of all, to find your repeating numbers, you can use this simple search:
/\(\d\{5\}\).\{-}\1
This search finds repetitions of 5 digits. Unfortunately, vim highlights from the start of the 5 digit number to the end of the repetition - including every digit in between - and this makes it hard to see what the 5 digit number is. Also, because your number sequence repeats so much, the whole thing is highlighted because there are repeats all the way through.
You will probably find it's more useful to use :set incsearch
and type /\(\d\{5\}\).\{-}\1
or /\(\d\{5\}\)\ze.\{-}\1
without hitting enter so you can see what the digits are.
This command might be more useful to you:
:syn region repeatSection matchgroup=Search start=/\z(\d\{30}\)/ matchgroup=Error end=/\z1/ oneline
This will highlight a sequence of 30 digits in yellow (first time it is seen) or red (when it is repeated). Note that this only works for a single line of text (multi-line isn't possible).
How about :g/\(\d\{30,\}\{2,\}\)/
?
I'm not sure why you need the negation. /\(\d\{4\}\)\1/
will match a sequence of (exactly) four digits, repeated once. You probably actually want something like /\(\d\{30,\}\)\1/
to get your "at least 30". This appears to work for me, unless I've misunderstood what you're trying to search for. Note that since the regex are greedy, you will get the longest possible repeated sequence.
If it helps you on the way, the appropriate way to make sure that the following set of characters aren't the same as what is stored in back-reference #1 would be (?!\1)
. Note that the (?!)
(negative look-ahead) group is a zero-width assertion (i.e., it will not change the position of the cursor, it just checks whether the regex should fail or not.)
Whether that is supported by the regex engine you're using, I don't know.
Update
I just had a quick sketch on paper, and something along these lines might work in PCRE... but I haven't tested it and can't right now, but maybe it'll give you some ideas:
(?=(\d{30}))\d(?=\d{29,}?\1)
To ensure that I understood you correctly, the purpose of the above regex would be to match any sequence of 30 digits that also exists later in the whole string being searched.
My thoughts for the above regex were these:
- First I want to match a sequence of 30 digits, but I don't want to consume them since I want to check 1 digit later (not 30) next time. Therefore I use a look-ahead with a capturing group that stores the next 30 digits.
- Then I consume one digit to ensure I don't match the 30 digits with themselves.
- Then I match at least 29 digits (which means I'll be starting on the digit just outside the current sequence of digits) with a non-greedy quantifier, so that it will try 30, then 31, etc.
- Then I match the 30 digits I'm currently testing. If they exist later in the sequence, the regular expression will succeed; otherwise, it will fail.
This command will match lines with 123451234
but not 111111111
:g/\(\d\{4}\)\1\@!.\1/
\1\@!.
uses a negative lookahead to say "make sure this position doesn't match (\@!
) group 1 (\1
), then consume a character (.
)"
精彩评论