Why promotion strategies based on market basket analysis may not work

Source: Vindevogel, B, D Van den Poel, and G Wets – Why promotion strategies based on market basket analysis do not work – 2004

The title of Vindevogel et al’s paper is thought-provoking, but also somewhat misleading – their position is not that promotion strategies based on MBA do not (ie never) work, but that the dynamics of sales response to price change are complex and there are identifiable reasons why a particular promotion strategy based on a particular MBA will not/did not work.

Time series analysis may be adapted to MBA to help the market analyst understand these underlying dynamics – and thereby to adopt better promotion strategies.

This paper is a great example of the caution I’d raise about a lot of MBA – too much of the time the use of algorithms is uninformed by real-world knowledge (expressed as hypotheses or at least some a priori notion of how the data may behave). This lack of informed engagement on the part of the subject-matter expert in the analysis leads to various shortcomings with a lot of applied MBA:

  1. data collection that’s inefficient
  2. data analysis that amounts to what I call “magical thinking” – others have called it “data dredging” or “conducting a fishing expedition”
  3. use of biased estimates of association effects
  4. a flood of “spurious” associations between/among products that amount to Type I errors (ie “false positives”) in the standard theory of statistical inference (that are due to/to be expected from the vast number of effective tests of hypothetical associations)
  5. most “legitimate” (ie statistically significant, when proper confidence values are used to assess multiple hypotheses) associations  are likely to be of no substantive or practical significance – and have little prospect of yielding a marketing strategy that will have any positive impact on sales

Some promising news: this paper suggests some ways of moving forward with MBA – other papers carry these suggestions further, particularly the necessity and possible means of evaluating the impact of your marketing strategies!

FYI – I’ve pushed some of the more technical details of the time-series analysis to the end of the post – the details are not needed to understand the thrust of the argument.

A near-to-next step is to identify a suite of R packages to carry out this analysis.

Key notions:

  • positive vs. negative cross-product price elasticities (aka cross-price elasticities)
  • short-run vs. long-run (aka persistent) impacts of price promotion on sales
  • nature of association between products
    • complements
    • substitutes
    • [in another paper we’ll see distinction between complements-in-purchase vs. complements-in-use and ditto substitutes
  • consumer-demand effect vs. competitive effect in response to price change
  • using a well-defined, customized time-series analysis to determine dynamic (short-run vs. long-run impact) of sales response (none vs. positive vs. negative) to price change as a function of the nature of the association between products + consumer-demand effect + competitive effect


In a recent Journal of Marketing article, Shocker, Bayus, and Kim (2004) call for a better understanding of the connectedness among products. Indeed, it is clear and well known that the purchase of one product can influence purchases of other products. The underlying dynamics of these processes, however, remain less clear.

In the data-mining community, association-rule discovery is a popular technique to analyze the connectedness between sets of products. The framework of association rules was introduced by Agrawal, Imielinski, and Swami (1993) to efficiently mine a large collection of transactions for patterns of consumer purchase behavior. Since the first publications concerning association rules discussed the mining of retail databases, this technique is also referred to as market basket analysis.

The result of a market basket analysis is a set of combinations of products that are purchased together. Since the publication of the paper by Agrawal et al. (1993), literally hundreds of publications followed based on the proposed framework. However, as Hand, Mannila, and Smyth (2001) state: “It is fair to say that there are far more papers published on algorithms to discover association rules than there are papers published on applications of it”.

In scientific publications, there are only a few applications of market basket analysis in a retailing context. Examples of these rare applications include:

In the popular business press or text books, however, other, intuitively appealing applications are mentioned when discussing market basket analysis, including the use of market basket analysis to implement more effective price-promotion strategies. The underlying assumption is that associated products exhibit positive cross-price elasticities or, otherwise stated, that a price promotion has a positive impact on the sales of associated products. Market basket analysis is then used to select price promotions that will have a beneficial impact on the sales of full-margin associated products.

A reflection of this belief can be illustrated by the following citation of an article by two data-mining consultants in the popular business press: “Business managers or analysts can use a market basket analysis to plan couponing and discounting. It is probably not a good idea to offer simultaneous discounts on [two products] if they tend to be bought together. Instead, discount one to pull in sales of the other.” (Brand & Gerritsen, 1998).

This belief was also found in text books on data mining. Giudici (2003, p. 209) for example writes: “by promoting just one of the associated products, it should be possible to increase the sales of that product and get accompanying sales increases for the associated products.”

In this paper, we investigate whether the assumption of positive cross-price elasticities between associated products stands firm. We have reasons to believe that associated products do not necessarily show positive cross-price elasticities. Indeed, literature suggests that consumers tend to buy several products from the same category during a single shopping trip. This behavior is known as “horizontal variety seeking“, and can result in association rules between substitute products, which are expected to show negative cross-price elasticities.

Dubé (2004), for example, shows that 31% of shopping trips involving carbonated soft drinks result in the purchase of two or more different products. This behavior can result in association rules between sets of these products. However, as Dubé (2004) shows, these products can be classified as being substitutes, since price changes of a product result in switching behavior. Hence, although these products are associated, a price promotion will result in a negative impact on the sales of the associated product.

In this research, we use the transactional database of a retailer to mine for association rules. In a second phase, we derive  the effect of price promotions on the associated products analytically. This enables us to generalize empirically whether market basket analysis can be used effectively to implement a price-promotion strategy.


Figure 1 illustrates the methodological framework of this study. Our analysis proceeds in two phases:

  1. mining a transactional database of a retailer for association rules
  2. applying multivariate time-series techniques to measure the dynamic impact of a price promotion
    1. estimating a vector autoregressive (VAR) model
    2. deriving the impulse response function, which measures the dynamic impact of a sales promotion on the sales of the associated product
    3. computing the cross-price elasticity from these responses
    4. classifying the relationships as being substitutes, complements or independent, depending on the sign of these elasticities

Methodological framework

Figure 1. Methodological framework.

Market basket analysis

Market basket analysis is a generic term for methodologies that study the composition of a basket of products purchased by a household during a single shopping trip. Agrawal et al. (1993) first introduced the association-rule framework to study market baskets. Originally, this framework consisted of two parameters: support and confidence. Silverstein, Brin and Motwani (1998) extended this framework by a third parameter: interest. More specifically, the three parameters are defined as follows:

Consider the association rule Y→Z, where Y and Z are two products. 1 Y represents the antecedent and Z is called the consequent.

Support of the rule: the percentage of all baskets that contain both products Y and Z

support = P(Y∧Z)

Confidence of the rule: the percentage of all the baskets containing Y that also contain Z. Hence, confidence is a conditional probability, i.e. P(Z|Y)

confidence = P(Y∧Z)/P(Y)

Interest of the rule: measures the statistical dependence of the rule, by relating the observed frequency of co-occurrence – P(Y∧Z) – to the expected frequency of co-occurrence under the assumption of conditional independence of Y and Z – P(Y)*P(Z)

interest = P(Y∧Z)/(P(Y) * P(Z))

Association-rule discovery is the process of finding strong product associations with a minimum support and/or confidence and an interest of at least one.

Measuring the dynamic effect of a price promotion using multivariate time-series techniques

A rationale for using multivariate time-series techniques

Recent research concerning the effects of price promotions is characterized by an increasing use of multivariate time-series techniques.

(Price promotion) ?⇒? (Short- and Long-run Impact on Sales) – Dekimpe, Hanssens, and Silva-Risso (1999)

  • short-run effects of price promotion were found for most cases
  • long-run effects were the exception

(Price Promotion) ?⇒? (Category Demand) – Nijs, Dekimpe, Steenkamp, and Hanssens (2001a)

  • short-run impact in 58% of the cases, with a duration of 10 weeks on average
  • long-run effects are exceptional – observed in 2% of the cases

(Price Promotion) ?⇒? (Category Incidence | Brand Choice | Purchase Quantity) – Pauwels, Hanssens, and Siddarth (2002)

  • significant short-term effects for each of the sales components, with durations of up to 8 weeks
  • in the long run, however, each sales component lacks a persistent promotion effect

(Temporary | Evolving | Structural Price Change) ?⇒? (Market Share) -Srinivasan, Leszczyc, and Bass (2000)

  • temporary price changes and price promotions have only a short-run effect on market share
  • structural changes or evolving prices have long-run effects on market share

Although long-term effects of price promotions/changes on sales/sales components are rare, there is still a rationale for the use of multivariate time-series techniques to study these sorts of effects:

  1. the rare case of a persistent effect of a price promotion on sales may be of high strategic relevance – being able to measure and explain these rare occurrences should be of interest to the marketing community
  2. time-series techniques are more flexible in measuring the short-run dynamics, which are observed in all studies. Indeed, time-series analysis is able to detect the most irregular fluctuations in the short-run promotional effects, whereas other techniques, like the Koyck model, necessitate an a priori specification of the response, which is usually a gradually decaying response
  3. in measuring the cross-price elasticity, the use of multivariate time-series techniques shows another benefit – as Nijs, Dekimpe, Steenkamp, and Hanssens (2001b) argue, a cross-sales effect can have two sources:
    1. a price promotion results in an increase in demand of the promoted product – causing changes in the demand of complement and substitute products = consumer-demand effect
    2. a price promotion can cause marketing reactions of the associated product, which obviously also results in a change in demand of the associated product = competitive effect
  4. an advantage of time-series analysis is that it simultaneously accounts for the two underlying sources of a cross-sales effect through the derivation of impulse-response functions

[See below for technical details of study design and analysis]

Figure 2 illustrates an impulse-response function with a short-run effect. It concerns the response of sales of toasted bread to a price promotion of instant soup. The significant responses are labeled with a dot.

Response of the sales of toasted bread to a price promotion of instant soup

Figure 2. Response of the sales of toasted bread to a price promotion of instant soup.

We observe a positive and significant immediate response of 1.31 (period 0). Weeks 1–3 are characterized by a small, be it insignificant, post-promotion dip followed by a period of purchase reinforcement in weeks 4–7. Week 7 is the first week that is followed by four non-significant responses, hence week 7 is the end of the dust-settling period. The short-run cross-price elasticity is derived by the summation of the responses during the dust-settling period, which yields a positive cross-price elasticity of 3.06. Since the responses converge to zero, there is no long-term cross-price effect.

Figure 3 illustrates an impulse-response function with a persistent, long-run effect. It concerns the response of vanilla-flavored ice-cream to a price promotion of chocolate flavored ice-cream. The graphical representation of the impulse-response function clearly shows that the responses converge to a persistent level of 0.97, which is the negative long-run cross-price elasticity. Here, the responses labeled with a dot indicate a response significantly different from the long-run effect. Week 10 is the first week that is followed by four non-significant responses, hence week 10 marks the end of the dust-settling period. The short-run cross-price elasticity is the sum of all the responses in the dust-settling period, which results in this case in a negative cross-price elasticity of 4.85.

Response of the sales of vanilla-flavored ice-cream to a price promotion of a chocolate-flavored ice-cream

Figure 3. Response of the sales of vanilla-flavored ice-cream to a price promotion of a chocolate-flavored ice-cream.

Empirical results

The relationship between the promoted product and the associated product is classified as being independent, substitute or complementary depending on the direction of the cross-price elasticity.

First, we investigate the long-run cross-price elasticity. If this measure appears to be positive, we label the relationship as being complementary, whereas a negative persistent effect indicates a substitution relationship. In the absence of a long-run effect, we use the short-run estimates to classify the relationship. Again, a positive cross-sales effect is classified as being complementary, while a negative effect is classified as substitute. In the absence of a short-run effect, we classify the relationship as being independent.

Applying the aforementioned classification scheme, the 2700 relationships are classified in the following way:

  • 1112 relationships are classified as being complements
  • 1212 relationships are classified as being substitutes, and
  • 376 relationships are classified as being independent.

In Table 1, we give examples of relationships that are classified as complements and substitutes. For both complements and substitutes, we list five relationships that were classified based on their long-run elasticity and five based on their short-run elasticity, resulting in 20 examples.


In 60 instances, a price promotion has a long-run/ persistent positive effect on the sales level of the associated product.

Persistent effects – complementary products
Promoted product Reacting product Persistent response
Fries Mayonnaise 0.73709
Low-fat yoghurt Kiwi 0.28199
Pork-cutlet Carrots 0.25441
Paprika Onion 0.20304
Bread Butter 0.01591
Short-run effects – complementary products
Promoted product Reacting product Short-run response Duration
Coffee Cream 5.58170 4
Shampoo Conditioner 3.56291 9
Instant soup Toasted bread 3.06367 7
Plastic plate Plastic cup 2.64980 15
Bread Smoked hams 0.99901ts 5
Persistent effects – substitution products
Promoted product Reacting product Persistent response
Tomato ketchup Curry ketchup -1.09531
Vanilla ice-cream Chocolate ice-cream -0.97388
Chicken soup Tomato soup -0.87933
Bottled water Coca-Cola -0.46227
Chocolate biscuit Spiced biscuit -0.37886
Short-run effects – substitution products
Promoted product Reacting product Short-run response Duration
Tzatziki Feta -1.38876 6
Tuna salad Salmon salad -1.06881 15
Cheese pie Cherry pie -0.69528 14
Orange juice Apple juice -0.61342 16
Rose-hip tea Yellow tea -0.22104 4

Table 1. Examples of cross-price elasticities between associated products.
Following our classification scheme, these instances are classified as complements. The mean value of this persistent cross-price elasticity is 0.89. The other 1052 complementary relationships are classified as being complements based on a positive short-run cross-price elasticity. The mean short-run elasticity is 4.56. These short-run dynamics take on average 13 weeks to stabilize.


Forty-two cross-price elasticities exhibit a persistent negative sign, meaning that the price promotion has a persistent negative effect on the sales of the associated product. The mean value of the 42 elasticities is 0.62. The short-run dynamics have a mean value of 4.59. It takes on average 16 weeks for the short-run dynamics to stabilize.

Although we only considered product couples that can be labeled as product associations, it is remarkable that we can classify even more relationships as substitutes (1212) than as complements (1112). This finding denies the intuitively appealing business idea that association rules can be used by retailers to implement more effective promotion strategies. Indeed, the underlying hypotheses that product associations necessarily show positive cross-price elasticities do not seem to hold.

As we have shown, however, there is a bigger probability that the sales of associated products will drop. Indeed, observing that customers tend to buy two products on the same shopping occasion does not imply a complementary relationship between these two products.

Consumers can buy products together for a variety of reasons (see Manchanda, Ansari, & Gupta, 1999, or Böcker, 1978).

Especially, horizontal variety seeking behavior can result in association rules between two products which are actually substitutes. Horizontal variety seeking involves the simultaneous purchase for multiple varieties (Kim, Allenby, & Rossi, 2002). The adjective horizontal is added to make a clear distinction between variety seeking behavior, which describes the process of temporal changes in tastes from purchase occasion to purchase occasion (see for example McAlister & Pessemier, 1982).

Dubé (2004) lists three reasons for the occurrence of horizontal variety seeking

  1. on a given shopping trip, consumers typically make purchases involving several consumption occasions. If preferences differ across these consumption occasions, this results in the simultaneous purchase of substitutes. A consumer can buy, for example, two flavors of tea, when he prefers flavor A in the morning, whereas he prefers flavor B in the evening.
  2. the purchase of a variety of products from the same assortment may be a response to the consumer’s uncertainty about his future preferences
  3. a consumer can make purchases for a complete household. This results in horizontal variety seeking, if preferences differ among the household members


In this research, we have shown empirically that a market basket analysis is not a good technique to implement more efficient promotion strategies. The underlying assumption that associated products are by consequence complements with positive cross-price elasticities cannot be validated. This assumption ignores the occurrence of horizontal variety seeking, which results in the simultaneous purchase of substitutes. We measured the cross-price elasticities of 1350 associated products and conclude that a price promotion has a higher likelihood to result in a decrease of the associated product. For this reason, we advise retailers not to build a promotion expert system based on a market basket analysis. This system should be built on the derivation of cross-price elasticities. Especially, the use of multivariate time series techniques, which account for dynamic effects, can be used to implement more efficient promotion strategies that benefit from positive cross-sales effects.

Technical details of the design and analysis

Unit root tests

The first step of our analysis involves the testing for unit roots. Those tests are necessary, since variables that appear to be non-stationary have to be differenced before entering the model, whereas stationary variables enter the model in levels. Moreover, the presence of a unit root is a necessary condition for the existence of long-run effects (Dekimpe & Hanssens, 1995b).

Augmented Dickey-Fuller tests were used to test for the presence of a unit root. We used the testing scheme proposed by Enders (1995) (see Appendix A). The optimal lag length for the autoregressive part of the test was chosen using the Schwarz Bayesian Criterium (SBC).

This testing procedure classifies each series as a unit-root process, a stationary process or a trend-stationary process.

Vector autoregressive (VAR) models

For each selected product couple, we estimate a four equation VAR model, with the prices and sales of both products as endogenous variables. We thereby control for factors that may influence sales, which are estimated as exogenous variables. More specifically, we estimate the effect of the featuring of the two products in the weekly folder of the retailer and the effect of the total sales per week of the retailer, which controls for external factors that may influence the sales of the two products. When one of the endogenous variables appears to be trend-stationary, a trend variable is included in all equations. 2 Hence, for every product association, the following system is estimated:

Vindevogel, B et al - Graphic 1 System estimation matrix

As mentioned earlier, endogenous variables that have a unit root enter the system in differences.

Impulse response functions

The effects of a price promotion on the sales of the associated product are estimated by deriving impulse response functions from the estimated VAR systems.

Formally, price promotions are operationalized as onetime unit shocks of the price variable in the VAR model in levels. The impact of a price promotion of product A on the sales of product B, for example, is operationalized by setting the value εPa,t to -1 and measuring the over-time impact on ln(Sb) of this one-time unit shock. From each VAR model we derive two impulse-response functions, measuring cross-price elasticities. Firstly, we estimate the response of the sales of product B to a price promotion of product A, and, secondly, we estimate the response of the sales of product A to a price promotion of product B.

In a VAR system, the instantaneous effects cannot be estimated directly, but are reflected in the variance–covariance matrix of the residuals. We use the method proposed by Evans and Wells (1983) to derive the instantaneous effects (for an explanation of this method, and an argumentation to use this method in this research, we refer to Appendix B).

In order to derive confidence intervals for the estimated responses, we used a bootstrap method. Therefore, elements from the residuals are randomly drawn with replacement. Based on these residuals and the parameters estimated for the VAR model, we create new values for the four endogenous time series. We then re-estimate the parameters of the VAR model using these new time series, and impulse response functions are derived based on this model. This procedure is repeated 500 times. Finally, the sample standard error is computed for these 500 response values. Using this standard error, we compute the t-values of each response. Responses with an absolute t-value higher than 1 are labeled as significant.

We follow Nijs et al. (2001a) in deriving a short-term and a long-term effect from the estimated impulse-response function. A long-term or persistent effect occurs when the asymptotic value of the response (t/N) is significantly different from zero. Short-run effects are the summation of all the impulses over the dust-settling period. The dust-settling period ends at the first period which is followed by four non-significant responses. 3

Description of the data

For our analysis, we used the transactional database of a large European retailer, which contains the sales transactions of six outlets between July 7th 1999 and March 26th 2003 of 15,017 different SKU’s.

First, we took a sample of all the transactions of 2002, and computed the support and interest measure for all possible combinations of two SKU’s. Since the database covers the sales of more than 15,000 different SKU’s, this results in the estimation of support and interest for more than 112,492,500 SKU pairs. We labeled a combination as a product association if it has an interest larger than two and a support exceeding 0.0157. 4 For the selected product associations we computed six variables on a weekly basis, which results in 194 weekly observations of the price of the two products (Pa and Pb), the sales in units of the two products (Sa and Sb), and the dummy variable that indicates whether the product featured in the folder for both products (Fa and Fb).

Since we are interested in the impact of price promotions on the sales of the associated product, we required that the price series of both products contain at least one price promotion in the 194 weeks. A price promotion was defined following the heuristic procedure in Abraham and Lodish (1993). They define a price promotion if the price is reduced by at least 5%, and then is raised again by at least 3% within the following 8 weeks. If there were weeks where the product was featured in the folder, these weeks do not count in the calculation of the 8 weeks period.

These restrictions result in 1350 selected product associations. As mentioned, we successively simulate a price promotion in both products and measure the impact of the promotion on the sales of the associated product, which results in the estimation of 2700 cross-price elasticities, both for the short and the long run.

Appendix A

Testing for unit roots using ADF-tests – Dolado, Jenkinson, and Sosvilla-Rivero (1990) and Enders (1995).

Testing for unit roots using ADF-tests

Testing for unit roots using ADF-tests.

Appendix B- Deriving instantaneous effects – Evans and Wells (1983)

In a VAR system, the instantaneous effects cannot be estimated directly, but are reflected in the variance–covariance matrix of the residuals. For example, if we observe a high covariance between the errors of the sales of products A and B, we can infer that there is a high instantaneous effect between sales of products A and B. A problem, however, remains that we cannot directly observe the direction of these instantaneous effects. In our example, we do not directly know whether it is the sales of product A that have an immediate effect on the sales of product B, or whether it is the other way around, i.e. sales of product B that influence the sales of product A. This problem is traditionally solved by imposing restrictions on the instantaneous effects. These restrictions impose a priori a causal ordering of the instantaneous effects. In a system with n endogenous variables, we need (n2-n)/2 restrictions for identification, which resolves to six restrictions in our four equation model. Imposing these six restrictions in our particular setting seems to be problematic however. While we could reasonably assume that feedback effects from sales to price take some time to materialize, 5 and hence restrict the instantaneous effects from price to sales to zero, this only yields four restrictions (sales A does not have an instantaneous effect on price A and price B, and sales B does not have an instantaneous effect on price A and price B). Imposing further restrictions cannot be done on a theoretical basis. If we observe an instantaneous effect between the price of A and B, for example, there are no theoretical grounds to impose that price A has an instantaneous effect on price B, while price B can only affect price A after one week. To circumvent this problem, we use the method proposed by Evans and Wells (1983) (see Dekimpe & Hanssens, 1999 for an application in marketing) to estimate the instantaneous effects, since this method does not imply to impose restrictions.

This method models instantaneous effects as the expected value of the error term given a particular shock and by assuming a multivariate normal distribution of the error terms. Formally, the expected instantaneous effect of variable j as a result of a shock k of variable i is computed as

E(εjj = k) σijii

where σij is the corresponding element in the variance–covariance matrix.

Applying this method to our setting, a price promotion of product A is operationalized as a shock in the residual vector of

[-σPa,SaPa,Pa,-σPa,SbPa,Pa, -1, -σPa,PbPa,Pa]


Abraham, M. M., Lodish, L. M. (1993). An implemented system for improving promotion productivity using store scanner data. Marketing Science, 12(3), 248–269.

Agrawal, R., Imielinski, T., Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the 1993 AC (pp. 207–216).

Anand, S. S., Patrick, A. R., Hughes, J. G., Bell, D. A. (1998). A data mining methodology for cross-sales. Knowledge-Based Systems, 10(7), 449–461.

Böcker, F. (1978). Die Bestimmung der Kaufverbundenheit von Produkten. Schriften zum Marketing , 7.

Brand, E., & Gerritsen, R. (1998). Associations and Sequencing. http:// www.dbmsmag.com/9807m03.html

Brijs, T., Swinnen, G., Vanhoof, K., Wets, G. (2004). Building an association rules framework to improve product assortment decisions. Data Mining and Knowledge Discovery, 8(1), 7–23.

Brin, S., Motwani, R., Silverstein, C. (1998). Beyond market baskets: Generalizing association rules to dependence rules. Data Mining and Knowledge Discovery, 2(1), 39–68.

Dekimpe, M., Hanssens, D. (1995a). The persistence of marketing effects on sales. Marketing Science, 14(1), 1–21.

Dekimpe, M., Hanssens, D. (1995b). Empirical generalizations about market evolutions and stationarity. Marketing Science, 14(3), G106– G121.

Dekimpe, M., Hanssens, D. (1999). Sustained spending and persistent response: A new look at long-term marketing profitability. Journal of Marketing Research, 36(4), 397–412.

Dekimpe, M., Hanssens, D., Silva-Risso, J. (1999). Long-run effects of price promotions in scanner markets. Journal of Econometrics, 89, 269– 291.

Dolado, J., Jenkinson, T., Sosvilla-Rivero, S. (1990). Cointegration and unit roots. Journal of Economic Surveys, 4, 249–273.

Dubé, J.-P. (2004). Multiple discreteness and product differentiation: Demand for carbonated soft drinks. Marketing Science, 23(1), 66–81.

Enders, W. (1995). Applied econometric time series. New York: Wiley.

Evans, L., Wells, G. (1983). An alternative approach to simulating VAR models. Economic Letters, 12(1), 23–29.

Giudici, P. (2003). Applied data mining. West Sussex, England: Wiley.

Hand, D., Mannila, H., Smyth, P. (2001). Principles of data mining. Massachusetts Institute of Technology.

Kim, J., Allenby, G. M., Rossi, P. E. (2002). Modeling consumer demand for variety. Marketing Science, 21(3), 229–250.

Manchanda, P., Ansari, A., Gupta, S. (1999). The “shopping basket”: A model for multicategory purchase incidence decisions. Marketing Science, 18(2), 95–114.

McAlister, L., Pessemier, E. (1982). Variety seeking behaviour: An interdisciplinary review. Journal of Consumer Research, 9(3), 311–322. Nijs, V. R., Dekimpe, M. G., Steenkamp, J.-B.E.M., Hanssens, D. M. (2001a). The category-demand effects of price promotions. Marketing Science, 20(1), 1–22.

Nijs, V. R., Dekimpe, M. G., Steenkamp, J.-B.E.M., & Hanssens, D. M. (2001b). Tracing the impact of price promotions across categories, Working Paper, Kellogg School of Management.

Pauwels, K., Hanssens, D., Siddarth, S. (2002). The long-term effects of price promotions on category incidence, brand choice, and purchase quantity. Journal of Marketing Research, 39(4), 421–439.

Shocker, A., Bayus, B., Kim, N. (2004). Product complements and substitutes in the real world: The relevance of “other products”. Journal of Marketing, 68(1), 28–40.

Srinivasan, S., Leszczyc, P., Bass, F. (2000). Market share response and competitive interaction: The impact of temporary, evolving and structural changes in prices. International Journal of Research in Marketing, 17(4), 281–305.

Van den Poel, D., De Schamphelaere, J., Wets, G. (2004). Direct and indirect effects of retail promotions. Expert Systems with Applications, 27(1), 53–62.

Wang, F.-S., Shao, H.-M. (2004). Effective personalized recommendation based on time-framed navigation clustering and association mining. Expert Systems with Applications, 27(3), 365–377.

  1. More general, Y and Z can be sets of products instead of single products.
  2.  Including a trend variable in all equations allows us to estimate the system using OLS. Only including a trend variable in the equations which are trend-stationary would oblige us to estimate the system with SUR, which results in a heavy computational load given the number of systems to be estimated. See Nijs et al., 2001a for a similar approach.
  3.  When there is no persistent effect, significant means significantly different from zero. When there is a persistent effect, significant means significantly different from the persistent effect.
  4.  The threshold value of 0.0157 results from the fact that we imposed that the two products should have been sold at least 1000 times together in the observation period. Since there were 6,368,614 baskets in total, this is the same as demanding a minimum support of 0.0157.
  5.  This assumption gives marketing mix variables causal priority over sales variables. For an application of this assumption see Dekimpe and Hanssens (1995a).

In general – Generating Association Rules (ARs)

Let’s consider how to extract Association /rules (ARs) efficiently from a given Frequent Itemset (FI).

Each frequent k-itemset, Y, can produce up to 2k-2 ARs, ignoring ARs that have empty antecedents or consequents (Φ → Y or Y → Φ). An AR can be extracted by partitioning the FI Y into two non-empty subsets, X and Y – X, such that X → Y – X satisfies the confidence threshold. Note that all such rules must have already met the support threshold because they are generated from an FI.


Let X = {1, 2, 3} be an FI. There are six candidate ARs that can be generated from X: {1, 2} → {3}, {1. 3} → {2}, {2, 3} → {1}, {1} → {2, 3}, {2} → {1, 3}, {3} → {1, 2}. As each of their support is identical to the support for X, the rules must satisfy the support threshold.

Computing the confidence of an AR does not require additional scans of the transaction dataset. Consider the rule {1, 2} → {3}, which is generated from the FI X = {1, 2, 3}. The confidence for this AR is σ({1, 2, 3}/σ({1, 2}). Because {1, 2, 3} is frequent, the anti-monotone property of support ensures that {1, 2} must be frequent, too. Since the support counts for both FIs were already found during the FI generation, there is no need to read the entire dataset again.

Confidence-based pruning

Unlike the support measure, confidence does not have a monotone property. For example,the confidence for X → Y can be larger, smaller, or equal to the confidence for another rule X’ → Y’, where X’ ⊆ X and Y’ ⊆ Y. Nevertheless, if we compare rules generated from the same frequent itemset Y, the following theorem holds for the confidence measure:

Theorem 1. If a rule X → Y – X does not satisfy the confidence threshold, then any rule X’ → Y – X’, where X’ is a subset of X, must not satisfy the confidence threshold as well.

To prove this theorem, consider the following two rules: X’ → Y – X’ and X → Y – X, where X’ ⊂ X. The confidence of the rules are σ(Y)/σ(X’) and σ(Y)/σ(X), respectively. Since X’ is a subset of X, σ(X’)/σ(X). Therefore, the former rule cannot have a higher confidence than the latter rule.

Previous: In general – Generating Association Rules (ARs)
Next: Apriori algorithm – Generating and pruning candidate FIs or Apriori algorithm – Generating Association Rules (ARs)

Apriori Algorithm – Generating Association Rules (ARs)

The Apriori algorithm uses a level-wise approach for generating association rules (ARs), where each level corresponds to the number of items that belong to the rule consequent. Initially, all the high-confidence rules that have only one item in the rule consequent are extracted. These rules are then used to generate new candidate rules.

For example, if {acd} → {b} and {abd} → {c} are high-confidence rules, then the candidate rule {ad} → {bc} is generated by merging the consequents of both rules.

Figure 1 shows a lattice structure for the ARs generated from the FI {a, b, c, d}. If any node in the lattice has low confidence, then according to Theorem 1 the entire subgraph spanned by the node can be pruned immediately. Suppose the confidence for {bcd} → {a} is low. All the rules containing item a in its consequent, including {cd} → {ab}, {bd} →  {ac}, {bc} →  {ad}, and {d} →  {abc} can be discarded.

Pruning of association rules using the confidence measure

Figure 1. Pruning of association rules using the confidence measure.

A pseudocode for the rule generation step is shown in Algorithms 1 and 2. Note the similarity between the ap-genrules procedure in Algorithm 2 and the FI generation procedure above. The only difference is that, in rule generation, we do not have to make additional passes over the data set to compute the confidence of the candidate rules. Instead, we determine the confidence of each rule by using the support counts computed during FI generation.

Rule generation of the Apriori algorithm

Algorithm 1. Rule generation of the Apriori algorithm.

Procedure ap-genrules

Algorithm 2. Procedure ap-genrules (fk, Hm).
Previous: Apriori algorithm – Generating Frequent Itemsets (FIs) or In general – Generating Association Rules (ARs)
Next: xx

Apriori algorithm – Generating candidate FIs

The apriori-gen function shown in Step 5 of Algorithm 1 generates candidate frequent itemsets by performing two operations:

  1. Candidate generation – This operation generates new candidate k-itemsets based on the frequent (k-1)-itemsets found in the previous iteration.
  2. Candidate pruning – This operation eliminates some of the candidate k-itemsets using the support-based pruning strategy.

To illustrate the candidate pruning operation, consider a candidate k-itemset, X = {i1, i2, i3, …, ik}.

The algorithm must determine whether all of its proper subsets, X – {ij} (j = 1, 2, 3, …, k), are frequent. If one of them is infrequent, then X is immediately pruned.

This approach can effectively reduce the number of candidate itemsets considered during support counting. The complexity of this operation is O(k) for each candidate k-itemset.

However, we do not have to examine all k subsets of a given candidate itemset. If m of the k subsets were used to generate a candidate, we only need to check the remaining k-m subsets during candidate pruning.

Requirements for a procedure to generate candidate itemsets

In principle, there are many ways to generate candidate itemsets. The following is a list of requirements for an effective candidate generate procedure:

  1. It should avoid generating too many unnecessary candidates – i.e. a candidate itemset is unnecessary if at least one of its subsets is infrequent. Such a candidate is guaranteed to be infrequent according to the anti-monotone property of support.
  2. It must ensure that the candidate set is complete – i.e. no frequent itemsets are left out by the candidate generation procedure. To ensure completeness, the set of candidate itemsets must subsume the set of all frequent itemsets, i.e. ∀k: FkCk.
  3. It should not generate the same candidate itemset more than once. E.g. the candidate itemset {a,b,c,d} can be generated many ways – by merging {a,b,c} with {d}; {b,d} with {a,c} etc. Generation of duplicate candidate leads to wasted computations and thus should be avoided for the sake of efficiency.

There are several candidate generation procedures – including the one used by the apriori-gen function.

Various procedures to generate candidate itemsets

Brute-force method

The brute-force method considers every k-itemset as a potential candidate and then applies the candidate pruning step to remove any unnecessary candidates (see Figure 1).

A brute-force method for generating candidate 3-itemsets

Figure 1. A brute-force method for generating candidate 3-itemsets.

The number of candidate itemsets generated at level k is


where d is the total number of items. Although candidate generation is rather trivial, candidate pruning becomes extremely expensive because a large number of itemsets must be examined. Given that the amount of computations needed for each candidate is O(k).

Fk-1 x Fk-1 method (used by apriori-gen)

The candidate generation procedure in the apriori-gen function merges a pair of frequent (k-1)-itemsets only if their first k-2 items are identical.

Let A – {a1, a2, a3, …, ak-1} and B = {b1, b2, b3, …, bk-1} be a pair of frequent (k-1)-itemsets. A and B are merged if they satisfy the following conditions:

ai = bi (for i = 1,2,3, …, k-2) and ak-1 ≠ bk-1

In Figure 2, the frequent itemsets {Bread, Diapers} and {Bread, Milk} are merged to form a candidate 3-itemset {Bread, Diapers, Milk}. The algorithm does not have to merge {Beer, Diapers} with {Diapers, Milk} because the first item in both itemsets is different. Indeed, if {Beer, Diapers, Milk} is a viable candidate, it would have been obtained by merging {Beer, Diapers} with {Beer, Milk} instead. This example illustrate both the completeness of the candidate generation procedure and the advantages of using lexicographic ordering to prevent duplicate candidates. However, because each candidate is obtained by merging a pair of frequent (k-1)-itemsets, an additional candidate pruning step is needed to ensure that the remaining k-2 subsets of the candidate are frequent.

Generating and pruning candidate k-itemsets by merging frequent pairs of frequent k-1-itemsets

Figure 2. Generating and pruning candidate k-itemsets by merging frequent pairs of frequent (k-1)-itemsets.

Fk-1 x F1 method – not elaborated here

Support counting

Support counting is the process of determining the frequency of occurrence of every candidate itemset that survives the candidate pruning step of the apriori-gen function. Support counting is implemented in Steps 6 through 11 of Algorithm 1.

One approach is to compare each transaction against every candidate itemset and to update the support counts of candidates contained in the transaction (Figure 3). This approach is computationally expense, especially when the numbers of transactions and candidate itemsets are large.

Counting the support of candidate itemsets

Figure 4. Counting the support of candidate itemsets.

An alternative is to enumerate the itemsets contained in each transaction and use them to update the support counts of their respective candidate itemsets.

To illustrate, consider a transaction t that contains five items, {1,2,3,4,5}. There are

[latex]\binom{5}{3} = \frac{5!}{3!(5-3)!} = 10[/latex]

itemsets of size 3 contained in this transaction. Some of the itemsets may correspond to the candidate 3-itemsets under investigation, in which case their support counts are incremented. Other subsets of t that do not correspond to any candidates can be ignored.

Figure 4 shows a systematic way for enumerating the 3-itemsets contained in t. Assuming that each itemset keeps its items in increasing lexicographic order, an itemset can be enumerated by specifying the smallest item first, followed by the larger items. For instance, given t = {1,2,3,4,5,6}, all the 3-itemsets contained in t must begin with item 1, 2, or 3.

Enumerating subsets of three items from a transaction t

Figure 4. Enumerating subsets of three items from a transaction t.

The number of ways to specify the first item of a 3-itemset contained in t is illustrated by the Level 1 prefix structures depicted in Figure 4. After fixing the first item, the prefix structure at Level 2 represent the number of ways to select the second item. Finally, the prefix structures at Level 3 represent the complete set of 3-itemsets contained in t.

The prefix structures shown in Figure 4 demonstrate how itemsets contained in a transaction can be systematically enumerated – i.e. by specifying their items one by one, from the leftmost item to the rightmost item.

We still have to determine whether each enumerated 3-itemset corresponds to an existing candidate itemset.

Support counting using a Hash Tree

In the Apriori algorithm, candidate itemsets are partitioned into different buckets and stored in a hash tree. During support counting, itemsets contained in each transaction are also hashed into their appropriate buckets. That way, instead of comparing each itemset in the transaction with every candidate itemset, it is matched only against candidate itemsets that belong to the same bucket (Figure 5).

Counting the support of itemsets using hash structure

Figure 5. Counting the support of itemsets using hash structure.

We don’t elaborate on procedure

Also, we don’t elaborate on computational complexity of Apriori algorithm

Previous: Apriori algorithm – Overview
Next: Apriori Algorithm – Generating Association Rules (ARs)

Apriori algorithm – Overview

Let’s consider how we may use the support measure to reduce the number of candidate itemsets that must be explored to generate frequent itemsets (i.e. where support ≥ minsup).

Theorem 1 (Apriori principle)If an itemset is frequent, then all of its subsets must also be frequent.

Consider the itemset lattice shown in Figure 1. Suppose {c,d,e} is a frequent itemset. Clearly, any transaction that contains {c,d,e} must also contain its subsets, {c,d}, {c,e}, {d,e}, {c}, {d}, and {e}. As a result, if {c,d,e} is frequent, then all subsets of {c,d,e} – the shaded itemsets in Figure 1 – must also be frequent.

An illustration of the apriori principle

Figure 1. An illustration of the Apriori principle.

Conversely, if an itemset {a,b} is infrequent, then all of its supersets must be infrequent too. In Figure 2, the entire subgraph conctaining the supersets of {a,b} can be pruned immediately once {a,b} is found to be infrequent.

This strategy of trimming the exponential search space based on the support measure is known as support-based pruning. Such a pruning strategy is made possible by a key property of the support measure, namely, that the support for an itemset never exceeds the support for its subsets. This property is also known as the anti-monotone property of the support measure.

Definition 1. (Monotonicity property) – Let I be the set of items, and J = 2I be the power set of I. A measure f is monotone (or upward closed) if

X,Y ∈ J : (X ⊆ Y) → f(X) ≤ f(Y),

which means that if X is a subset of Y, then f(X) must not exceed f(Y).

On the other hand, f is anti-monotone (or downward closed) if

X,Y ∈ J : (X ⊆ Y) → f(Y) ≤ f(X),

which means that if X is a subset of Y, then f(Y) must not exceed f(X).

Any measure that possesses an anti-monotone property can be incorporated directly into the mining algorithm to prune the search space of candidate itemsets.

Generating FIs using the Apriori algorithm

Apriori is the first Association Rule (AR) mining algorithm to use support-based pruning to address the exponential growth of candidate itemsets.

TID Items
1 {Bread, Milk}
2 {Bread, Diapers, Beer, Eggs}
3 {Bread, Diapers, Beer, Cola}
4 {Bread, Milk, Diapers, Beer}
5 {Bread, Milk, Diapers, Cola}

Table 1. An example of market basket transactions.

Figure 2 provides a high-level illustration of Apriori’s generation of FIs for the transactions shown in Table 1.

We assume that the support threshold is 60%, which is equivalent to a minimum support count equal to 3.

Illustration of frequent itemset generation using the Apriori algorithmFigure 2. Illustration of frequent itemset generation using the Apriori algorithm

In iteration 1, every item is considered as a candidate 1-itemset. After counting their supports, the candidate itemsets {Cola} and {Eggs} are discarded because they appear in fewer than three transactions.

In iteration 2, candidate 2-itemsets are generated using only the frequent 1-itemsets, because the Apriori principle ensures that all supersets of the infrequent 1-itemsets must be infrequent. Because there are only four frequent 1-itemsets, the number of candidate 2-itemsets generated by the algorith is

[latex]\binom{4}{2} = \frac{4!}{2!(4-2)!} = 6[/latex]

After counting their support values,two of these six candidate itemsets – {Beer, Bread} and {Beer, Milk} – turn out to be infrequent. The remaining four candidates are frequent, and are used to generate 3-itemsets in the next iteration.

In iteration 3, candidate 3-itemsets are generated using only the four frequent 2-itemsets, because the Apriori principle ensures that all supersets of the infrequent 2-itemsets must be infrequent.

Without support-based pruning, there are

[latex]\binom{6}{3} = \frac{6!}{3!(6-3)!} = 20[/latex]

candidate 3-itemsets that can be formed using the six items. With the Apriori principle, we only need to keep candidate 3-itemsets whose subsets are frequent. The only candidate that has this property is {Bread, Diapers, Milk}.

The efficiency of the Apriori pruning strategy can be shown by counting the number of candidate itemsets generated.

A brute-force strategy of enumerating all itemsets (up to size 3) as candidates will always generate

[latex]\binom{6}{1} +\binom{6}{2} +\binom{6}{3} = \frac{6!}{1!(6-1)!} +\frac{6!}{2!(6-2)!}+\frac{6!}{3!(6-3)!} = 6+15+20 = 41[/latex]

candidate itemsets.

The Apriori pruning strategy (applied to our data) produced

[latex]\binom{6}{1} +\binom{4}{2} + 1 = \frac{6!}{1!(6-1)!} +\frac{4!}{2!(4-2)!} = 6+6+1 = 13[/latex]

candidate itemsets – a 68% reduction in our case.

Algorithm 1 presents the psuedocode for the FI generation part of the Apriori algorithm. Let Ck denote the set of candidate k-itemsets and Fk denote the set of frequent k-itemsets:

  • The algorithm initially makes a single pass over the data set to determine the support of each item. Upon completion of this step, the set of all frequent 1-itemsets – F1 – will be known (steps 1 and 2).
  • The algorithm iteratively generates new candidate k-itemsets using the frequent (k-1)-itemsets found in the previous iteration (step 5). Candidate generation is implemented using a function called apriori-gen.
  • To count the support of the candidate itemsets, the algorithm needs to make an additional pass over the data set (steps 6 – 10). The subset function is used to determine all the candidate itemsets in Ck that are contained in each transaction t. The implementation of this function is described in YYY.
  • After counting their supports, the algorithm eliminates all candidate itemsets whose support counts are less than minsup (step 12).
  • The algorithm terminates when there are no new frequent itemsets generated, i.e. Fk = Φ (step 13).

Frequent itemset generation of the Apriori algorithm

Algorithm 1. FI generation of the Apriori algorithm.

Characteristics of Apriori‘s generation of FIs

The FI generation part of the Apriori algorithm has two important characteristics:

  1. It is a level-wise algorithm – it traverses the itemset lattice one level at a time, from frequent 1-itemsets to the maximum size of FIs.
  2. It employs a generate-and-test strategy for finding FIs – at each iteration, new candidate itemsets are generated from the FIs found in the previous iteration. The support for each candidate itemset is then counted and tested against minsup.

The total number of iterations needed by the Apriori algorithm is kmax+1, where kmax is the maximum size of the FIs.

Previous: In general – Generating Frequent Itemsets (FIs) – Overview
Next: Apriori algorithm – Generating and pruning candidate FIs

In general – Generating Frequent Itemsets (FIs)

A lattice structure can be used to enumerate the list of all possible itemsets. Figure 1 shows an itemset lattice for I = {a,b,c,d}. In general, a data set that contains k items can generate up to 2k-1 frequent itemsets, excluding the null set.

An itemset lattice

Figure 1. An itemset lattice.

A brute-force approach for finding frequent itemsets is to determine the support count for every candidate itemset in the lattice structure. To do this, we need to compare each candidate itemset against every transaction (see Figure 2). If the candidate is contained in a transaction, its support count will be incremented. E.g. the support for {Bread, Milk} is incremented three times, because the itemset is contained in transactions 1, 4, and 5. Such an approach can be very expensive, because it requires O(NMw) comparisons, where N is the number of transactions, M – 2k-1 is the number of candidate itemsets, and w is the maximum transaction width.

Counting the support of candidate itemsets

Figure 2. Counting the support of candidate itemsets.

There are several ways to reduce the computational complexity of frequent itemset generation:

  1. Reduce the number of candidate itemsets (M) – the Apriori principle (see below) is an effective way to eliminate some of the candidate itemsets without counting their support values.
  2. Reduce the number of comparisons – instead of matching every candidate itemset against every transaction, we can reduce the number of comparisons by using more advanced data structures, either to store the candidate itemsets or to compress the data sets (see below).

Previous: The problem of mining Association Rules – ARs
Next: In general – Generating Association Rules (ARs) or Apriori algorithm – Overview

Basket Analysis – Rattle

Basket Analysis

The simplest association analysis is often referred to as market basket analysis. Within Rattle this is enabled when the Baskets button is checked. In this case, the data is thought of as representing shopping baskets (or any other type of collection of items, such as a basket of medical tests, a basket of medicines prescribed to a patient, a basket of stocks held by an investor, and so on). Each basket has a unique identifier, and the variable specified as an Ident variable in the Data tab is taken as the identifier of a shopping basket. The contents of the basket are then the items contained in the column of data identified as the target variable. For market basket analysis, these are the only two variables used.

To illustrate market basket analysis with Rattle, we will use a very simple dataset consisting of the DVD movies purchased by customers. Suppose the data is stored in the filedvdtrans.csv and consists of the following:

1,Sixth Sense
1,Harry Potter1
1,Green Mile
4,Sixth Sense
5,Sixth Sense
6,Sixth Sense
7,Harry Potter1
7,Harry Potter2
9,Sixth Sense
10,Sixth Sense
10,Green Mile

We load this data into Rattle and choose the appropriate variable roles. In this case it is quite simple:

Togaware rattle-dvd-variables

On the Associate tab (of the Unsupervised paradigm) ensure the Baskets check box is checked. Click the Execute button to identify the associations:

Togaware rattle-dvd-associate-top

Here we see a summary of the associations found. There were 38 association rules that met the criteria of having a minimum support of 0.1 and a minimum confidence of 0.1. Of these, 9 were of length 1 (i.e., a single item that has occurred frequently enough in the data), 20 were of length 2 and another 9 of length 3. Across the rules the support ranges from 0.11 up to 0.56. Confidence ranges from 0.11 up to 1.0, and lift from 0.9 up to 9.0.

The lower part of the same textview contains information about the running of the algorithm:

Togaware rattle-dvd-associate-bot

We can see the variable settings used, noting that Rattle only provides access to a smaller set of settings (support and confidence). The output includes timing information fore the various phases of the algorithm. For such a small dataset, the times are of course essentially 0!

The Problem of Mining Association Rules

The problem of mining association rules may be stated formally as follows:

Discovery of Association Rule(s) Given a set of transactions T, find all the association rules having support ≥ minsup and confidence ≥ minconf, where minsup and minconf are the corresponding support and confidence thresholds. Definition 1

A brute-force approach for mining association rules is to compute the support and confidence for every possible rule. This approach is prohibitively expensive because there are exponentially many rules that can be extracted from a data set. Indeed, in most datasets, the vast majority of association rules are discarded after we apply reasonable thresholds for support (minsup) and confidence (minconf) – making the brute-force approach very inefficient in terms of the number of association rules yielded per computation.

To avoid performing needless computations, it would be useful to prune the rules early without having to compute their support and confidence values.

An initial step toward improving the performance of association rule mining algorithms is to decouple the support and confidence requirements. From Equation 2,

Support: Formula for Support - Ch 6 Equation 2

notice that the support of a rule X Y depends only on the support of its corresponding itemset, X Y .

For example, the following association rules have identical support because they involve items from the same itemset, {Visit to Site of Designer3, Visit to Site of Designer4, Visit to Site of Designer1}:

{Visit to Site of Designer3, Visit to Site of Designer4} → {Visit to Site of Designer1}

{Visit to Site of Designer1, Visit to Site of Designer3} → {Visit to Site of Designer4}

{Visit to Site of Designer1, Visit to Site of Designer4} → {Visit to Site of Designer3}


If the itemset is infrequent, then all candidate association rules that are derived from the itemset can be pruned immediately without our having to compute the confidence in the association rules.

Therefore, a common strategy adopted by many algorithms is to decompose the problem of mining association rule(s) into two major sub-tasks:

  1. Generating Frequent Itemset(s), whose objective is to find all the itemsets that satisfy the minsup.  These itemsets are called frequent itemsets.
  2. Generating Strong Association Rule(s), whose objective is to extract all the high-confidence rules from the frequent itemsets found in the previous step. These rules are called strong association rules.

The computational requirements for generating frequent itemset(s) are generally more expensive than those for generating strong association rule(s). See techniques for generating frequent itemsets and generating strong association rules in xxx.

Previous: Association Analysis – Basic Concepts and Algorithms

Next: Generation of Frequent Itemsets (FIs)  – Overview


Association Analysis – Basic Concepts and Algorithms

Problem Definition

Market basket data can be represented in a binary format as shown in Table 1, where each row corresponds to a transaction and each column corresponds to an item. An item can be treated as a binary variable whose value is one if the item is present in a transaction and zero otherwise. Because the presence of an item in a transaction is often considered more important than its absence, an item is an asymmetric binary variable.

UserID D1 D2 D3 D4
U1 1 0 0 0
U2 0 0 1 1
U3 0 0 0 0
U4 0 0 1 0
U5 0 0 0 0
U6 0 0 0 0
U7 0 0 0 0
U8 1 0 1 1

Table 1. A binary 0/1 representation of market basket data.

This representation is perhaps a very simplistic view of real market basket data because it ignores certain important aspects of the data such as the quantity of items sold or the price paid to purchase them.

[In your case, you’ll be concerned eventually with whether the visit to website resulted in a conversion, value of purchase, etc].

In General: Itemset, Transaction, and Transaction Width

Let I = {i1,i2,,id} be the set of all items in a market basket data and T = {t1,t2,…,tN} be the set of all transactions. Each transaction ti contains a subset of items chosen from I. In association analysis, a collection of zero or more items is termed an itemset. If an itemset contains k items, it is called a k-itemset. For instance, {Beer, Diapers, Milk} is an example of a 3-itemset. The null (or empty) set {∅} is an itemset that does not contain any items.

The transaction width is defined as the number of items present in a transaction. A transaction tj is said to contain an itemset X if X is a subset of tj. For example, the second transaction shown in Table 6.2 contains the itemset {Bread, Diapers} but not {Bread, Milk}.

In Your Case: Itemset, Transaction, and Transaction Width

In your case, items correspond to “Visit to Site of Designer1” – “Visit to Site of Designer2” – “Visit to Site of Designer3” – etc.

So the set of all items is I={Visit to Site of Designer1,  Visit to Site of Designer2, Visit to Site of Designer3, … etc}.

2-itemset corresponds to {Visit to Site of Designer1, Visit to a Site of Designer2} – a 3-itemset corresponds to {Visit to Site of Designer1, Visit to a Site of Designer2, Visit to Site of Designer5} etc. The null (or empty) itemset, {∅}, does not contain any Visit to the Site of any Designer.

In your case, a transaction corresponds to User1, User2, etc.

This is a potential source of confusion for you - "transaction" suggests an action (haha) or some kind of event. Eventually you may have a sufficiently large sample of data to allow you to move in this direction - then you'll be able to redefine "transaction" to mean something like "A User's Visiting the Company's website == Transaction/Session". An itemset then will then become the Sites of Designer that the User visits in the course of that Transaction/Session. A User may then contribute multiple transactions (multiple rows in the spreadsheet) within the period of observation. For now, you don't have this kind of 'transaction" - you only have one entry (one row) per User. We could maybe choose a better word than "transaction" in your case - but for the sake of aligning with the language in Chapter 6, we've kept with "transaction".

In your case, the transaction width corresponds to the Number of Sites of different Designers that a given User has visited.

Again, watch the potential confusion - a more usual understanding of transaction width would correspond to the Number of Sites of different Designers that a given User has visited within a given Session.

A transaction tj is said to contain an itemset X, if X is a subset of tj. For example, the second transaction (corresponding to User2) shown in Table 1 contains the itemset {Visit to Site of Designer3, Visit to Site of Designer4} but not {Visit to Site of Designer2, Visit to Site of Designer4}.

Support Count

An important property of an itemset is its support count, which refers to the number of transactions that contain a particular itemset.

Mathematically, the support count, σ(X), for an itemset X can be stated as follows:

Support Count Formula for Support Count - Ch 6 Equation 1

where the symbol | · | denotes the number of elements in a set.

In the data set shown in Table 1, the support count for {Visit to Site of Designer3, Visit to Site of Designer4} is equal to two because there are only two transactions that contain all two items, i.e. only two Users who had visited the sites of both Designers.

Association Rule, Support, and Confidence

An association rule is an implication expression of the form X Y , where X and Y are disjoint itemsets, i.e., X Y = ∅.

The strength of an association rule can be measured in terms of its support and confidence.

Support determines how often a rule is applicable to a given data set.

Confidence determines how frequently items in Y appear in transactions that contain X. The formal definitions of these metrics are:

Support Formula for Support - Ch 6 Equation 2
Confidence Formula for Confidence - Ch 6 Equation 3


Consider the association rule {Visit to Site of Designer3, Visit to Site of Designer4} → {Visit to Site of Designer1}.

Since the support count for {Visit to Site of Designer3, Visit to Site of Designer4, Visit to Site of Designer1} is 1 and the total number of transactions (i.e. Users) is 8, support for the association rule is 1/8 = 0.125.

Confidence in the association rule is obtained by dividing the support count for {Visit to Site of Designer3, Visit to Site of Designer4, Visit to Site of Designer1} by the support count for {Visit to Site of Designer3, Visit to Site of Designer4}. There are 2 transactions that contain Visit to Site of Designer3 and Visit to Site of Designer4 – so the confidence in the association rule {Visit to Site of Designer3, Visit to Site of Designer4} → {Visit to Site of Designer1} is 1/8 divided by 2/8, or 1/2 = 0.5.

Why Use Support and Confidence?

Support is an important measure because an association rule that has very low support may occur simply by chance, i.e. randomly. A low support rule is also likely to be uninteresting from a business perspective because it may not be profitable to promote items that customers seldom buy together (with the exception discussed later). For these reasons, support is often used to eliminate uninteresting rules. As will be shown later, support also has a desirable property that can be exploited for the efficient discovery of association rules.

Confidence, on the other hand, measures the reliability of the inference made by an association rule. For a given rule X Y , the higher the confidence, the more likely it is for Y to be present in transactions that contain X. Confidence also provides an estimate of the conditional probability of Y given X.

Association analysis results should be interpreted with caution. The inference made by an association rule does not necessarily imply causality. Instead, it suggests a strong co-occurrence relationship i.e. correlation between items in the antecedent and consequent of the rule.

Next: The Problem of Mining Association Rules

Market Basket Analysis (MBA) and High Utility Rare Itemsets (HURI)

Market Basket Analysis

High Utility Rare Itemsets

Fuzzy Association Rules

Weighted Rules

Temporal Rules

Composite Items

General Reference