What are appropriate values for minimum confidence and minimum support values for the Aprio开发者_运维技巧ri algorithm? How could you tweak them? Are they fixed values, or do they change during the running of the algorithm? If you have used this algorithm before, what values did you use?
I would suggest to start with values 0.05 for support and 0.80 for confidence. But I agree that you should understand what exactly they represent in order to be able to define them appropriately. For a rule A => B (where A, B non empty sets)
Support (A ⇒ B): s = P(A, B)
Confidence (A ⇒ B): c = P(B | A)
Lift (A ⇒ B): L = c/P(B)
Lift is important to assess the interestingness of a rule (because you usually come up with hundreds of them). More than twenty measures of interestingness have been proposed. These include the Ф-coefficient, kappa, mutual information, the J-measure and the Gini index. I personaly order my rules according to the J-measure.
J.measure (A ⇒B): J = s/c * (c*log(L) + (1-c)*log((L-c)/L))
You have to set the minsup and minconf values before running the algorithm and they do not change during the mining process.
Choosing the minsup parameters depends on your data.
For some data, I use 80 %. For some other data, I use 0.05 % . It all depends on the dataset. Usually, I start with a high value, and then I decrease the values until i find a value that will generate enough paterns.
For the confidence, it is a little bit easier because it represents the confidence that you want in the rules. So usually, I use something like 60 %. But it also depends on the data.
Besides, if you don't want to use the minsup parameters you can use a top-k mining algorithm. In this case, you will specify k=1000 for instance and the algorithm will discover 1000 rules for example instead of using minsup. I have designed one such algorithm for association rule mining. It is called TopKRules and you download the source code. The paper describing it will be published soon. It uses just two parameters: k and minconf.
精彩评论