开发者

using pyparsing to parse a list of regexes (literally)

开发者 https://www.devze.com 2023-01-26 08:07 出处:网络
I\'d like to parse a list of regular expressions to calculate the likelihood of each to find a match to it in a certain text/string...

I'd like to parse a list of regular expressions to calculate the likelihood of each to find a match to it in a certain text/string...

Eg. finding '[AB]' in a string of length 1 should be something around 1/13 (considering only captial letters).

Is there a generic regex parser, which returns the individual positions/alternatives? I'm thinking of getting a list of positions as return ('[AB].A{2}' would yield '[['A','B'],'.',['AA']')

The problem is the parsing of regular expressions with pyparsing. Simple regexes are no problem, but when it comes to "alternatives" and repetitions, I'm lost: 开发者_Python百科I find it hard to parse nested expressions like '((A[AB])|(AB))'.

Any thoughts?


Simulation rather than calculation may be the way to go.

Set up a population of representative text strings. (Linguists would call such a set a corpus.) For any given regex, find the number of strings it matches, and divide by the total number of strings in your corpus.

Your own example giving the likelihood of '[AB]' as 1/13 is based on this way of thinking, using the corpus of single-capital-letter strings. You got 1/13 by seeing that there are two matches out of the 26 strings in the corpus.

Create a larger corpus: maybe the set of all alphanumeric strings up to a certain length, or all ASCII strings up to a certain length, or the dictionary of your choice. Thinking about what corpus best suits your purpose is a good way to clarify what you mean by "likelihood".


You use ['A', 'B'] to say: or A or B. then you can put some thing like this:

'[{'A', ['A', 'B']}, {'A', 'B'}]'

At there you use [] to "one of these" as use {} to "all these"

1/2 to '{'A', ['A', 'B']}'
   'A' => 1/1
   ['A', 'B'] => 1/2
   (1/1) * (1/2) = 1/2
   this (1/2) times the extern (1/2) = (1/4)
1/2 to '{'A', 'B'}' -> (1/26) to each.
Multiplify two times: 1/(26^2) and multiplify by the 1/2 = (1/(26^2))/2.

Now multiplify both:  (1/4) * ((1/(26^2))/2)

It was a so bad explanation... I'll retry...

[] => Calc de probability: {probability of each term} / {num of terms}
{} => Calc de probability of each term and multiplify all

understand?

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号