Causal impact of recommendations on Amazon - Causal Book: Design Patterns in Causal Inference

Recommender systems are one of the most popular algorithms used in online retail. The basic promise of a recommender system is that a recommendation increases the probability of a customer will buy a product (above a certain threshold). However, whether the recommender system *caused* the customer to buy the product, or whether the customer would have bought the product anyway, is not an easy distinction to make. Consider the example from Sharma et al. (2015):[^1] > To see why, consider users who visit Amazon.com in search of a pair of winter gloves. Upon viewing the product page for the gloves, some users might notice a winter hat listed as a recommendation and click on it to continue browsing. According to the naive approach that simply counts clicks, this view would be attributed to the recommender system. But the question we focus on here is whether the recommender caused these users to view another product—in this case a winter hat—or if they would have done so anyway in a counterfactual world in which the recommender did not exist (Rubin, 2005). So, does the recommendation cause the customers to buy the winter hat, or would they have bought it anyway after buying a pair of winter gloves? The authors solve the problem by using an instrumental variable in which one product (Product A) experiences an instantaneous shock in direct traffic, but the products recommended next to it (Product B) do not. > When the demand for the recommended product is known to be constant, any increase in click-throughs from the focal product can be attributed to the recommender, and hence we can estimate its causal effect simply by dividing the observed change in recommendation click-throughs during the shock by the exogenous change in traffic over the same period. Let's say, a focal product is featured on a popular show (TV, YouTube) and traffic to its web page spikes. This spike increases the number of recommendations displayed on the focal product's page (Product A) for a recommended product (Product B). Thus, the instrument meets the relevance criterion. For the shock to serve as a proper instrumental variable, it must also satisfy the exclusion restriction. That is, the same shock should not increase the direct traffic to the recommended product. The authors apply a filter to find such shocks and products, where they relax the expectation that there won't be _any_ increases in direct traffic (for practical reasons, just to keep a reasonable sample size.)[^2] Using DAGs, the authors visualize the use of the instrumental variable for causal identification for this problem as follows: ![[Pasted image 20240215150707.png]] where $z_{it}$ is the shock (instrumental variable), $v_{it}$ is the total traffic to the focal product (treatment), and $r_{ijt}$ is the number of referral clicks (outcome). $d_{jt}$ is the direct traffic. The authors explain their identification strategy as follows and acknowledge that the measured effect is the [[Local Average Treatment Effect]] (LATE), not the average treatment effect. > As is typical for instrumental variable approaches, moreover, our causal estimate is not the average treatment effect (ATE) that one would obtain from an ideal randomized experiment, but rather a local average treatment effect (LATE) that, strictly speaking, estimates the effect only for users who respond to shocks, which in turn is unlikely to be a random sample of the overall population. As [Imbens 2009] has argued, however, the “LATE vs. ATE” issue is unavoidable for instrumental variable approaches; thus, in the absence of a randomized experiment on the Amazon website a local, shock-based strategy such as ours is still useful for identifying causal effects provided that the associated concerns regarding generalizability are adequately addressed. This means that the measured effect is the causal click-through rate on recommendations for users who participate in shocks. Thus, the definition of shocks is critical to assessing the validity of the results here. Section 4.2 of the original paper discusses in detail how the shocks are operationalized. ![[amazon-case-results.png]] The authors summarize and compare the results in the figure above. The figure compares the naïve estimates of recommendation traffic by product group with the corresponding causal estimates using the instrumental variable approach described above. The dashed red line shows the naïve estimate, the average conversion rate on recommendations for shocked products in each category, implying a conversion rate of more than 15% on recommendations for e-books and toys, for example. The effect size appears to be overblown. The solid blue line shows the average estimated causal effect for shocked products in each category (LATE), indicating that the majority of observed clicks are due to convenience alone (customers would have bought the recommended product anyway as discussed in [[Confounding by intention]]), and that a more accurate estimate of the causal effect of the recommender in these and other categories is 5% or less. The dashed red line in the figure looks only at outbound clicks from "People who bought this also bought" recommendations. [^1]: Sharma, A., Hofman, J. M., & Watts, D. J. (2015, June). Estimating the causal impact of recommendation systems from observational data. In Proceedings of the Sixteenth ACM Conference on Economics and Computation (pp. 453-470). [^2]: The actual filtering condition looks somewhat arbitrary to us (i.e., choosing a parameter to keep 90% of all shocks), but that's beyond our scope. See the original paper for details. > [!info]- Last updated: September 3, 2024