Yelp ratings and restaurant revenue - Causal Book: Design Patterns in Causal Inference

By the late 2000s, online consumer reviews had become a dominant signal of quality in restaurant markets. By 2009, 70% of Seattle restaurants had a presence on Yelp.com. The natural question is whether reviews actually move money: does a higher displayed rating cause customers to spend more, or are well-rated restaurants simply better in ways that would have driven revenue regardless of Yelp? The challenge is that ratings and revenue are both shaped by underlying restaurant quality (a hard to measure confounder). A restaurant with great food earns higher ratings *and* higher revenue, so a naïve regression of revenue on rating would conflate Yelp's effect with quality and other parameters such as location, management, and reputation. Luca (2016) addresses this with an institutional feature of Yelp: the rounding rule.[^1] Yelp displays each restaurant's average rating rounded to the nearest half-star. A restaurant with an underlying average of 3.24 stars displays as 3.0 stars; one with 3.25 stars displays as 3.5. Two restaurants on either side of a rounding boundary are essentially identical in underlying quality, but differ by half a star in what consumers see. Luca (2016) reduces the question to: does that half-star bump produce a measurable revenue jump? > Yelp prominently displays a restaurant's rounded average rating. I can identify the *causal* impact of Yelp ratings on demand with a regression discontinuity framework that exploits Yelp's rounding thresholds. I present three findings about the impact of consumer reviews on the restaurant industry: (1) a one-star increase in Yelp rating leads to a 5-9 percent increase in revenue, (2) this effect is driven by independent restaurants; ratings do not affect restaurants with chain affiliation, and (3) chain restaurants have declined in market share as Yelp penetration has increased. The data combine Yelp reviews with quarterly revenue records (from the Washington State Department of Revenue) for Seattle restaurants from January 2003 to October 2009. The final panel contains 3,582 restaurants over 27 quarters, with about 1,587 restaurants open in any given quarter. Yelp launched in Seattle in August 2005, and by the end of the sample window covered 70% of operational restaurants in the city. The identification strategy reduces to a sharp [[Design Pattern II - Regression Discontinuity (RD)|RD]] at each rounding boundary: ![[yelp-luca-dag.png]] where $X$ is the underlying continuous average rating, $D = \mathbb{1}[X \text{ rounds up}]$ is the binary treatment indicator (a half-star bump), $Y$ is log revenue, and $U$ collects unobserved restaurant-level confounders such as food quality, location, and management. The continuous rating $X$ is correlated with $U$ and with revenue, but the displayed rating $D$ is a deterministic step function of $X$ at each rounding boundary. For restaurants very close to the threshold, being rounded up rather than down is locally as-good-as-random. The estimating equation, restricted to restaurants whose underlying rating is within 0.1 stars of a rounding threshold, is: $\ln(\text{Revenue}_{jt}) = \beta T_{jt} + \gamma q_{ojt} + \alpha_j + \alpha_t + \epsilon_{jt}$ where $q_{ojt}$ is the continuous unrounded average rating, and $\alpha_j$, $\alpha_t$ are restaurant and time fixed effects. The coefficient $\beta$ is the [[Local Average Treatment Effect|treatment effect at the rounding boundary]]: the effect of an exogenous half-star bump in the displayed rating. > I find that an exogenous one-star improvement leads to a roughly 9% increase in revenue. (Note that the shock is one-half star, but I renormalize for ease of interpretation). The result provides support to the claim that Yelp has a causal effect on demand. In particular, whether a particular restaurant is rounded up or rounded down should be uncorrelated with other changes in reputation outside of Yelp. The most natural threat to identification is *manipulation*: restaurants writing fake reviews to push themselves over a rounding boundary. Luca (2016) uses the [[McCrary density test]] to check for a discontinuity in the density of underlying ratings at each rounding threshold: > If gaming were driving the result, then one would expect ratings to be clustered just above the discontinuities. However, this is not the case. More generally, the results are robust to many types of firm manipulation. ![[yelp-luca-illustration-of-rd.png]] The figure matches the graphical setup in Figure 4 of the original paper, and shows the averages of normalized (restaurant-demeaned) log revenue against distance from a half-star rounding threshold in **0.01-star** bins over a **±0.1**-star window. The plotted binned averages are illustrative and the substantive revenue comparisons are from the paper: the paper renormalizes the half-star contrast to one star and reports a 5–9% revenue effect. The main effect masks meaningful heterogeneity. Luca (2016) finds that the revenue effect is driven by *independent* restaurants; chain restaurants show no statistically detectable response to a Yelp rating bump. The intuition is informational: chain affiliation already conveys quality through the brand, so a marginal Yelp signal carries less new information. Independent restaurants have no comparable brand anchor, so Yelp ratings substitute for missing information. Luca (2016) also documents that chain restaurants lost revenue share to independents as Yelp penetration increased. A few [[Design Pattern II - Regression Discontinuity (RD)|design-pattern-level]] caveats follow directly from the RD design. The estimated effect is a [[Local Average Treatment Effect|treatment effect at the rounding boundary]]; it does not generalize to arbitrary rating changes far from a boundary. The estimate's bandwidth choice (0.1 stars) is reported with sensitivity analysis in the paper, but as discussed in [[Oh my! Bandwidth sensitivity in the RD model]], any RD estimate is contingent on the bandwidth choice. Once again, external validity requires caution. The sample is restaurants in Seattle in the late 2000s, when Yelp was still expanding. Whether the same elasticity holds in mature platform markets, in non-restaurant categories, or in cities where Yelp competes with comparable platforms is an empirical question this design cannot answer on its own.[^2] None of this undermines the study. The study's value is precisely that it identifies one clean local effect, not that the result generalizes everywhere. [^1]: Luca, M. (2016). *Reviews, reputation, and revenue: The case of Yelp.com.* Harvard Business School Working Paper No. 12-016. [^2]: The methodology Luca (2016) proposes (using a platform's rounding rule as the running-variable cutoff) generalizes wherever a continuous quality measure is reduced to a coarsened display. The paper notes that the same identification can be applied to Amazon product-rating rounding, RottenTomatoes "rotten/fresh" labels, or Gap.com clothing reviews, all of which round underlying continuous scores. > [!info]- Last updated: May 15, 2026