开发者

How to detect deviations in a sequence from a pattern?

开发者 https://www.devze.com 2023-03-03 22:23 出处:网络
i want to detect the passages where a sequence of action deviates from a given pattern and can\'t figure out a clever solution to do this, although the problem sound really simple.

i want to detect the passages where a sequence of action deviates from a given pattern and can't figure out a clever solution to do this, although the problem sound really simple.

The objective of the pattern is to describe somehow a normal sequence. More specific: "What actions should or shouldn't be contained in an action sequence and in which order?" Then i want to match the action sequence against the pattern and detect deviations and their locations.

My first approach was to do this with regular expressions. Here is an example:

Example 1:
Pattern: A.*BC
Sequence: AGDBC (matches)
Sequence: AGEDC (does not match)

Example 2:
Pattern: ABCD
Sequence: ABD (does not match)
Sequence: ABED (does not match)
Sequence: ABCED (does not match)

Example 3:
Pattern: ABCDEF
Sequence: ABXDXF (does not match)

With regular expressions it is simple to detect a error but not where it occurs. My approach was to successively remove the last rege开发者_开发问答x block until I can find the pattern in the sequence. Then i will know the last correct actions and have at least found the first deviation. But this doesn't seem the best solution to me. Further i can't all deviations.

Other soultions in my mind are working with state machines, order tools like ANTLR. But I don't know if they can solve my problem. I want to detect errors of omission and commission and give an user the possibility to create his own pattern. Do you know a good way to do this?


While matching your input, a regular expression engine has information on the location of a mismatch -- it may however not provide it in a way where you can easily access it.

Consider e.g. a DFA implementing the expression. It fetches characters sequentially, matching them with expectation, and you are interested at the point in the sequence where there is no valid match.

Other implementations may go back and forth, and you would be interested in the maximum address of any character that was fetched.

In Java, this could possibly be done by supplying a CharSequence implementation to

   java.util.regex.Pattern.matches(String regex, CharSequence input) 

where the accessor methods keep track of the maximum index.

I have not tried that, though. And it does not help you categorizing an error, either.


have you looked at markov chains? http://en.wikipedia.org/wiki/Markov_chain - it sounds like you want unexpected transitions. maybe also hidden markov models http://en.wikipedia.org/wiki/Hidden_Markov_Models


Find an open-source implementation of regexs, and add in a hook to return/set/print/save the index at which it fails, if a given comparison does not match. Alternately, write your own RE engine (not for the faint of heart) so it'll do exactly what you want!

0

精彩评论

暂无评论...
验证码 换一张
取 消