When Should Teams Run or Pass?
A contextualized analysis of run/pass effectiveness using Conditional Average Treatment Effect (CATE).
Passing is more efficient than running. This has been hammered home consistently in prior research. Honestly, it’s beating a dead horse at this point, but I’ll reiterate for effect. From 2018 through 2024 NFL offenses averaged +0.017 EPA/play on dropbacks compared to -0.059 EPA/play on designed runs.
Despite that, the average designed run rate has hovered around 34-36% in each season since 2018. That’s because running the ball is still an efficient play call compared to passing within certain contexts. Handing the ball off in short yardage and in condensed areas like near the goal line are still efficient. I became curious as to how much more effective running, or passing, were compared to the other given the situation.
In this study, I set out to evaluate the counterfactual effectiveness of passing versus running in different game situations. While raw averages of Expected Points Added (EPA) per play suggest that passing is generally more valuable than running, those averages are purely descriptive: they reflect what actually happened, not what might have happened if the offense had chosen differently. To answer that counterfactual question — how much better or worse a team would have done by passing instead of running in the same situation — I used a causal inference approach.
The analysis draws on play-by-play data from the 2018 through 2024 NFL seasons. Each play was coded as a designed run or pass, with EPA as the outcome of interest. To capture the situational context that coaches face, I included covariates for down, distance, field position, score differential, time remaining, quarter, and available timeouts. These variables both reflect decision-making constraints and help control for confounders that influence both play selection and play outcomes.
To estimate treatment effects, I trained a Causal Forest model. This method adapts random forests to the causal setting, producing estimates of the Conditional Average Treatment Effect (CATE): the expected change in EPA if the offense had passed instead of run — or passed instead of run, holding the situation fixed. Aggregating across all plays, the model also provides an Average Treatment Effect (ATE). In this dataset, the ATE was approximately +0.13 EPA per play in favor of passing, consistent with broader league-wide efficiency trends.
The greater value, however, lies in the CATE estimates. These allow the analysis to move beyond averages and identify how the pass–run tradeoff shifts across situations. By grouping plays into down, yards to go, and field position buckets, I could evaluate whether the counterfactual advantage tilted toward passing or running. While individual estimates have wide confidence intervals, directional patterns emerge—for example, even in many traditional run-heavy spots like early downs, the model still leans toward passing, albeit with smaller margins than in neutral field situations.
Finally, I assessed covariate balance and calibration to ensure the model’s estimates were well-behaved and interpretable. While no model can prescribe the “correct” call on a single play, this framework provides a structured, counterfactual lens for comparing run and pass decisions. The results highlight situations where passing meaningfully improves expected outcomes, as well as contexts where running closes the gap, offering coaches and analysts evidence-based guidance that acknowledges both situational nuance and uncertainty.
Here are the CATE EPA results binned by Down and Yards to Go. A negative CATE value indicates that passing is often the more optimal decision while a positive CATE value means running was a more optimal decision. I classified a +0.10 CATE as Run Favored and a -0.10 CATE as Pass Favored.
Offenses are still running too often on 1st & 10 with a near 50% designed run rate, but they are correctly passing more often on 2nd & long (7+ YTG). I’ll admit the Run Favored classification on 4th & Short (2-3 YTG) feels wrong, especially given teams run in that situation just 5% of the time.
This is a good time to remind everyone that the confidence intervals for the CATE estimates are wide and largely overlapping, indicating that the differences are not statistically distinguishable from one another. The estimates should be viewed as directional tendencies rather than precise thresholds to follow blindly.
You could imagine handing something like this to an offensive coach and saying, “We should mix in more passes on 1st & 10, what are your favorite pass concepts in those situations?”
Here is a more granular table grouped by Down, Yards to Go, and Field Position.
If you have any questions, feel free to reach out.


