Academia.eduAcademia.edu
IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661 Volume 3, Issue 6 (Sep-Oct. 2012), PP 43-47 www.iosrjournals.org Mining Negative Association Rules Tushar Mani Ph.D. (CSE), NIMS University, Jaipur (Rajasthan), India Abstract: Association rule mining is a data mining task that discovers associations among items in a transactional database. Typical association rules consider only items enumerated in transactions. Such rules are referred to as positive association rules. Negative association rules also consider the same items, but in addition consider negated items (i.e. absent from transactions). Negative association rules are useful in marketbasket analysis to identify products that conflict with each other or products that complement each other. They are also very convenient for associative classifiers, classifiers that build their classification model based on association rules. Many other applications would benefit from negative association rules if it was not for the expensive process to discover them. Indeed, mining for such rules necessitates the examination of an exponentially large search space. In this paper, we propose an algorithm that mines negative association rules by using conviction measure which does not require extra database scans. I. Introduction Association rule mining is a data mining task that discovers relationships among items in a transactional database. Association rules have been extensively studied in the literature for their usefulness in many application domains such as recommender systems, diagnosis decisions support, telecommunication, intrusion detection, etc. The efficient discovery of such rules has been a major focus in the data mining research community. From the original apriori algorithm [1] there have been a remarkable number of variants and improvements of association rule mining algorithms [2]. Association rule analysis is the task of discovering association rules that occur frequently in a given data set. A typical example of association rule mining application is the market basket analysis. In this process, the behaviour of the customers is studied when buying different products in a shopping store. The discovery of interesting patterns in this collection of data can lead to important marketing and management strategic decisions. For instance, if a customer buys bread, what is the probability that he/she buys milk as well? Depending on the probability of such an association, marketing personnel can develop better planning of the shelf space in the store or can base their discount strategies on such associations/correlations found in the data. All the traditional association rule mining algorithms were developed to find positive associations between items. By positive associations we refer to associations between items existing in transactions (i.e. items bought). What about associations of the type: “customers that buy Coke do not buy Pepsi” or “customers that buy juice do not buy bottled water”? In addition to the positive associations, the negative association can provide valuable information, in devising marketing strategies. Interestingly, very few have focused on negative association rules due to the difficulty in discovering these rules. Although some researchers pointed out the importance of negative associations [3], only few groups of researchers [4], [5], [6] proposed an algorithm to mine these types of associations. This not only illustrates the novelty of negative association rules, but also the challenge in discovering them. II. Basic Concept and terminology This section introduces basic concept of association rules and some related work on negative association rules. 2.1 Association Rules Formally, association rules are defined as follows: Let I = {i1, i2,…,in} be a set of items. Let D be a set of transactions, where each transaction T is a set of items such that T ⊆฀I. Each transaction is associated with a unique identifier TID. A transaction T is said to contain X, a set of items in I, if X ⊆฀T. An association rule is an implication of the form “X -> ฀Y”, where X ⊆฀I; Y ⊆฀I, and X ∩ Y = Φ. The rule X ->฀Y has support s in the transaction set D if s% of the transactions in D contain X U Y. In other words, the support of the rule is the probability that X and Y hold together among all the possible presented cases. It is said that the rule X ->฀Y holds in the transaction set D with confidence c if c% of transactions in D that contain X also contain Y. In other words, the confidence of the rule is the conditional probability that the consequent Y is true under the condition of the antecedent X. The problem of discovering all association rules from a set of transactions D consists of generating the rules that have a support and confidence greater than given thresholds. These rules are called strong rules, and the framework is known as the support-confidence framework for association rule mining. www.iosrjournals.org 43 | Page Mining Negative Association Rules A negative association rule is an implication of the form X ->฀┐Y (or ┐ X ->฀Y or ┐ X ->฀┐ Y), where X ⊆฀I, Y ⊆฀I and X ∩ Y = Φ (Note that although rule in the form of ┐ X ->฀┐ Y contains negative elements, it is equivalent to a positive association rule in the form of Y->X. Therefore it is not considered as a negative association rule.) In contrast to positive rules, a negative rule encapsulates relationship between the occurrences of one set of items with the absence of the other set of items. The rule X ->฀┐ Y has support s % in the data sets, if s % of transactions in T contain itemset X while do not contain itemset Y. The support of a negative association rule, supp( X ->฀┐Y), is the frequency of occurrence of transactions with item set X in the absence of item set Y. Let U be the set of transactions that contain all items in X. The rule X ->฀┐ Y holds in the given data set (database) with confidence c %, if c% of transactions in U do not contain item set Y. Confidence of negative association rule, conf ( X ->฀┐ Y), can be calculated with P( X ┐ Y )/P(X), where P(.) is the probability function. The support and confidence of itemsets are calculated during iterations. However, it is difficult to count the support and confidence of non-existing items in transactions. To avoid counting them directly, we can compute the measures through those of positive rules. III. Related Work In Negative Association Rule Mining A new idea to mine strong negative rules presented in [15]. They combine positive frequent itemsets[8] with domain knowledge in the form of taxonomy to mine negative associations. However, their algorithm is hard to generalize since it is domain dependent and requires a predefined taxonomy. Finding negative itemsets involve following steps: (1) first find all the generalized large itemsets in the data (i.e., itemsets at all levels in the taxonomy whose support is greater than the user specified minimum support) (2) next identify the candidate negative itemsets based on the large itemsets and the taxonomy and assign them expected support. (3) in the last step, count the actual support for the candidate itemsets and retain only the negative itemsets .The interest measure RI of negative association rule X ฀฀┐Y, as follows RI=(E[support( X U Y )]-support( X U Y))/support(X) Where E[support(X)] is the expected support of an itemset X. A new measure called mininterest, (the argument is that a rule A ฀฀B is of interest only if supp ( A U B) - supp(A) supp(B) ≥ mininterest) added on top of the support-confidence framework[17]. They consider the itemsets (positive or negative) that exceed minimum support and minimum interest thresholds as itemsets of interest. Although, [17] introduces the “mininterest” parameter, the authors do not discuss how to set it and what would be the impact on the results when changing this parameter. A novel approach has proposed in [16]. In this, mining both positive and negative association rules of interest can be decomposed into the following two sub problems, (1) generate the set of frequent itemsets [8] of interest (PL) and the set of infrequent itemsets of interest (NL) (2) extract positive rules of the form A=>B in PL, and negative rules of the forms A ฀฀┐ B, ┐ A฀ B and ┐ A ฀฀┐ B in NL. To generate PL, NL and negative association rules they developed three functions namely, fipi( ), iipis() and CPIR( ). The most common frame-work in the association rule generation is the “Support-Confidence” one. In [14], authors considered another frame-work called correlation analysis that adds to the support-confidence. In this paper, they combined the two phases (mining frequent itemsets[8] and generating strong association rules) and generated the relevant rules while analyzing the correlations within each candidate itemset. This avoids evaluating item combinations redundantly. Indeed, for each generated candidate itemset, they computed all possible combinations of items to analyze their correlations. At the end, they keep only those rules generated from item combinations with strong correlation. If the correlation is positive, a positive rule is discovered. If the correlation is negative, two negative rules are discovered. The negative rules produced are of the form X ฀฀┐Y or ┐ X ฀฀Y which the authors term as “confined negative association rules”. Here the entire antecedent or consequent is either a conjunction of negated attributes or a conjunction of non-negated attributes. An innovative approach has proposed in [13]. In this generating positive and negative association rules consists of four steps: (1) Generate all positive frequent itemsets L ( P1 ) (ii) for all itemsets I in L( P1 ), generate negative frequent itemsets of the form ┐ ( I1 I2 ) (iii) Generate all negative frequent itemsets ┐ I1 ┐I2 (iv) Generate all negative frequent itemsets I1 ┐ I2 and (v) Generate all valid positive and negative association rules . Authors generated negative rules without adding additional interesting measure(s) to support-confidence frame work. A new and different approach has been proposed in [7]. This is simple but effective. It is not using any additional interesting measures and additional database scans. In this approach, it is finding negative itemsets by replacing a literal in a candidate itemset by its corresponding negated item. If a candidate itemset contains 3 items then it will produce corresponding 3 negative itemsets one for each literal. IV. Discovering Negative Association Rules The most common framework in the association rules generation is the “support-confidence” one. Although these two parameters allow the pruning of many associations that are discovered in data, there are cases when many uninteresting rules may be produced. In this paper we consider another interesting measure www.iosrjournals.org 44 | Page Mining Negative Association Rules called conviction that adds to the support- confidence framework. Next section introduces the measure conviction.  ฀The conviction of a rule is defined as: 1- supp(Y) Conv( X => Y ) = 1-conf(X=>Y) conv(X=>Y) can be interpreted as the ratio of the expected frequency that X occurs without Y (that is X=>┐Y) if X and Y were independent divided by the observed frequency of incorrect predictions. The range of conviction is 0 to ∞ A. Algorithm MPNAR In this section we propose and explain our algorithm. Algorithm: Mining Negative Association Rules Input: TDB-Transactional Database mins -minimum support minc-minimum confidence Output: Negative Association Rules Method: 1. NAR฀Ф /* initially NAR is empty */ 2. Scan the database and find the set of frequent 1.itemsets(F1) 3. for (k=2;Fk-1!=Φ; k++) 4. { 5. Ck= Fk⋈-1 Fk-1 /* generates candidates itemsets */ 6. // Prune using Apriori Property 7. for each i ε Ck, any subset of i is not in Fk-1 then Ck = CK - { i } 8. for each i ε Ck /* perform database scanning to find support */ 9. { 10. s = Support( i); 11. for each A,B (A U B= i ) 12. { 13. if ( Supp(X ฀฀┐Y) ≥mins && Conviction(X฀┐Y) ≤2.0) /* produces NAR of the form X฀┐Y */ 14. NAR฀฀NAR U { X฀฀┐Y) /* produces NAR of the form ┐X ฀ Y */ 15. if ( Supp(┐X฀ Y) ≥ mins && Conviction(┐X฀ Y) ≤2.0) then 16. NAR ฀฀NAR U { ┐X ฀ Y } 17. } 18. } 19. }  Support ( ┐X ) = 1-support(A)  Support( ┐X ∪ Y ) = support(Y) – support(X ∪ Y)    Support( X∪┐Y ) = support(X) – support(X ∪ Y) Support (┐X ∪┐Y ) = 1 – support - support(X) – support(Y) + support(X ∪ Y) The generation of positive rules continues without V. Experimental Results We tested our algorithm with [14]. We consider a transactional database contains many transactions. We tested our algorithm with reference [14] with different minimum supports and minimum confidences. Our algorithm is performing well than one in [14]. www.iosrjournals.org 45 | Page Mining Negative Association Rules Fig 1:minimum support =30% and different minimum confidences Fig 2: minimum support =40% and different minimum confidences Fig 3: minimum support =50% and different minimum confidences Fig 4: minimum confidence =60% and different minimum supports Fig 5: minimum confidence =70% and different minimum supports www.iosrjournals.org 46 | Page Mining Negative Association Rules Fig 6: minimum confidence =80% and different minimum supports VI. Conclusion And Future Work In this paper we introduced a new algorithm to generate both positive and negative association rules. Our method adds the conviction to the support-confidence framework to generate stronger positive and negative rules. We compared our algorithm with [14] on a real dataset. We discussed their performances on a transactional database and analyzed experimental results. The results prove that our algorithm can perform better than one in [14]. In future we wish to conduct experiments on some other real datasets and compare the performance of our algorithm with other related algorithms such as reference [7] and reference [16]. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. of SIGMOD. (1993) 207–216 Goethals, B., Zaki, M., eds.: FIMI’03: Workshop on Frequent Itemset Mining Implementations. Volume 90 of CEUR Workshop Proceedings series. (2003) http://CEUR-WS.org/Vol-90/. S.Brin, R. Motwani, and C.Silverstein. Beyond market baskets: Generalizing association rules to correlations. In ACM SIGMOD,Tucson, Arizona, 1997. Wu, X., Zhang, C., Zhang, S.: Mining both positive and negative association rules. In: Proc. of ICML. (2002) 658–665 Teng, W., Hsieh, M., Chen, M.: On the mining of substitution rules for statistically dependent items. In: Proc. of ICDM. (2002) 442–449 Savasere, A., Omiecinski, E., Navathe, S.: Mining for strong negative associations in a large database of customer transactions. In: Proc. of ICDE. (1998) 494–502 B.Ramasubbareddy, A.Govardhan, and A.Ramamohanreddy. Mining Positive and Negative Association Rules, IEEE ICSE 2010, Hefeai, China, August 2010. Goethals, B., Zaki, M., eds.: FIMI’03: Workshop on Frequent Itemset Mining Implementations. Volume 90 of CEUR Workshop Proceedings series. (2003) http://CEUR-WS.org/Vol-90/. Teng, W., Hsieh, M., Chen, M.: On the mining of substitution rules for statistically dependent items. In: Proc. of ICDM. (2002) 442–449 Tan, P., Kumar, V.: Interestingness measures for association patterns: A perspective.In: Proc. of Work shop on Postprocessing in Machine Learning and Data Mining. (2000) Gourab Kundu, Md. Monirul Islam, Sirajum Munir, Md. Faizul Bari ACN: An Associative Classifier with Negative Rules 11th IEEE International Conference on Computational Science and Engineering, 2008. Brin,S., Motwani,R. and Silverstein,C., “ Beyond Market Baskets: Generalizing Association Rules to Correlations,” Proc. ACM SIGMOD Conf., pp.265-276, May 1997. Chris Cornelis, peng Yan, Xing Zhang, Guoqing Chen: Mining Positive and Negative Association Rules from Large Databases , IEEE conference 2006. M.L. Antonie and O.R. Za¨ıane, ”Mining Positive and Negative Association Rules: an Approach for Confined Rules”, Proc. Intl. Conf. on Principles and Practice of Knowledge Discovery in Databases, 2004, pp 27–38. Savasere, A., Omiecinski,E., Navathe, S.: Mining for Strong negative associations in a large data base of customer transactions. In: Proc. of ICDE. (1998) 494- 502.. Wu, X., Zhang, C., Zhang, S.: efficient mining both positive and negative association rules. ACM Transactions on Information Systems, Vol. 22, No.3, July 2004,Pages 381-405. Wu, X., Zhang, C., Zhang, S.: Mining both positive and negative association rules.In: Proc. of ICML. (2002) 658–665 Yuan,X., Buckles, B.,Yuan, Z.,Zhang, J.:Mining Negative Association Rules. In: Proc. of ISCC. (2002) 623-629. Honglei Zhu, Zhigang Xu: An Effective Algorithm for Mining Positive and Negative Association Rules. International Conference on Computer Science and Software Engineering 2008. www.iosrjournals.org 47 | Page