I'd like to parse a list of regular expressions to calculate the likelihood of each to find a match to it in a certain text/string...
Eg. finding '[AB]
' in a string of length 1 should be something around 1/13 (considering only captial letters).
Is there a generic regex parser, which returns the individual positions/alternatives?
I'm thinking of getting a list of positions as return ('[AB].A{2}
' would yield '[['A','B'],'.',['AA']
')
The problem is the parsing of regular expressions with pyparsing.
Simple regexes are no problem, but when it comes to "alternatives" and repetitions, I'm lost: 开发者_Python百科I find it hard to parse nested expressions like '((A[AB])|(AB))
'.
Any thoughts?
Simulation rather than calculation may be the way to go.
Set up a population of representative text strings. (Linguists would call such a set a corpus.) For any given regex, find the number of strings it matches, and divide by the total number of strings in your corpus.
Your own example giving the likelihood of '[AB]' as 1/13 is based on this way of thinking, using the corpus of single-capital-letter strings. You got 1/13 by seeing that there are two matches out of the 26 strings in the corpus.
Create a larger corpus: maybe the set of all alphanumeric strings up to a certain length, or all ASCII strings up to a certain length, or the dictionary of your choice. Thinking about what corpus best suits your purpose is a good way to clarify what you mean by "likelihood".
You use ['A', 'B'] to say: or A or B. then you can put some thing like this:
'[{'A', ['A', 'B']}, {'A', 'B'}]'
At there you use [] to "one of these" as use {} to "all these"
1/2 to '{'A', ['A', 'B']}'
'A' => 1/1
['A', 'B'] => 1/2
(1/1) * (1/2) = 1/2
this (1/2) times the extern (1/2) = (1/4)
1/2 to '{'A', 'B'}' -> (1/26) to each.
Multiplify two times: 1/(26^2) and multiplify by the 1/2 = (1/(26^2))/2.
Now multiplify both: (1/4) * ((1/(26^2))/2)
It was a so bad explanation... I'll retry...
[] => Calc de probability: {probability of each term} / {num of terms}
{} => Calc de probability of each term and multiplify all
understand?
精彩评论