py-why · fverac · Jun 12, 2025 · May 22, 2025 · Jun 5, 2025 · Jun 5, 2025
diff --git a/doc/spec/references.rst b/doc/spec/references.rst
@@ -17,6 +17,12 @@ References
     Two-Stage Estimation with a High-Dimensional Second Stage.
     2018.
 
+.. [Chernozhukov2022]
+    V. Chernozhukov, C. Cinelli, N. Kallus, W. Newey, A. Sharma, and V. Syrgkanis.
+    Long Story Short: Omitted Variable Bias in Causal Machine Learning.
+    *NBER Working Paper No. 30302*, 2022.
+    URL https://www.nber.org/papers/w30302.
+
 .. [Hartford2017]
     Jason Hartford, Greg Lewis, Kevin Leyton-Brown, and Matt Taddy.
     Deep IV: A flexible approach for counterfactual prediction.

diff --git a/doc/spec/spec.rst b/doc/spec/spec.rst
@@ -13,6 +13,7 @@ EconML User Guide
     estimation_dynamic
     inference
     model_selection
+    validation
     interpretability
     federated_learning
     references

diff --git a/doc/spec/validation.rst b/doc/spec/validation.rst
@@ -0,0 +1,68 @@
+Validation
+======================
+
+Validating causal estimates is inherently challenging, as the true counterfactual outcome for a given treatment is
+unobservable. However, there are several checks and tools available in EconML to help assess the credibility of causal
+estimates.
+
+
+Sensitivity Analysis
+---------------------
+
+For many EconML estimators, unobserved confounding can lead to biased causal estimates.
+Moreover, it is impossible to prove the absence of unobserved confounders.
+This is a fundamental problem for observational causal inference.
+
+To mitigate this problem, EconML provides a suite of sensitivity analysis tools,
+based on [Chernozhukov2022]_,
+to assess the robustness of causal estimates to unobserved confounding. 
+
+Specifically, select estimators (subclasses of :class:`.DML` and :class:`.DRLearner`)
+have access to ``sensitivity_analysis``, ``robustness_value``, and ``sensitivity_summary`` methods.
+
+``sensitivity_analysis`` provides an updated confidence interval for the ATE based on a specified level of unobserved confounding.
+
+
+``robustness_value`` computes the minimum level of unobserved confounding required
+so that confidence intervals around the ATE would begin to include the given point (0 by default).
+
+
+``sensitivity_summary`` provides a summary of the the two above methods.
+
+DRTester
+----------------
+
+EconML provides the :class:`.DRTester` class, which implements Best Linear Predictor (BLP), calibration r-squared,
+and uplift modeling methods for validation.
+
+See an example notebook `here <https://github.com/py-why/EconML/blob/main/notebooks/CATE%20validation.ipynb>`__.
+
+Scoring
+-------
+
+Many EconML estimators implement a ``.score`` method to evaluate the goodness-of-fit of the final model. While it may be 
+difficult to make direct sense of results from ``.score``, EconML offers the :class:`RScorer` class to facilitate model 
+selection based on scoring.
+
+:class:`RScorer` enables comparison and selection among different causal models.
+
+See an example notebook `here
+<https://github.com/py-why/EconML/blob/main/notebooks/Causal%20Model%20Selection%20with%20the%20RScorer.ipynb>`__.
+
+Confidence Intervals and Inference
+----------------------------------
+
+Most EconML estimators allow for inference, including standard errors, confidence intervals, and p-values for
+estimated effects. A common validation approach is to check whether the p-values are below a chosen significance level
+(e.g., 0.05). If not, the null hypothesis that the causal effect is zero cannot be rejected.
+
+**Note:** Inference results are only valid if the model specification is correct. For example, if a linear model is used
+but the true data-generating process is nonlinear, the inference may not be reliable. It is generally not possible to
+guarantee correct specification, so p-value inspection should be considered a surface-level check.
+
+DoWhy Refutation Tests
+----------------------
+
+The DoWhy library, which complements EconML, includes several refutation tests for validating causal estimates. These
+tests work by comparing the original causal estimate to estimates obtained from perturbed versions of the data, helping
+to assess the robustness of causal conclusions.
diff --git a/econml/dml/causal_forest.py b/econml/dml/causal_forest.py
@@ -857,7 +857,7 @@ def sensitivity_interval(self, alpha=0.05, c_y=0.05, c_t=0.05, rho=1., interval_
 
         Can only be calculated when Y and T are single arrays, and T is binary or continuous.
 
-        Based on `Chernozhukov et al. (2022) <https://www.nber.org/papers/w30302>`_
+        Based on [Chernozhukov2022]_
 
         Parameters
         ----------
@@ -901,7 +901,7 @@ def robustness_value(self, null_hypothesis=0, alpha=0.05, interval_type='ci'):
 
         Can only be calculated when Y and T are single arrays, and T is binary or continuous.
 
-        Based on `Chernozhukov et al. (2022) <https://www.nber.org/papers/w30302>`_
+        Based on [Chernozhukov2022]_
 
         Parameters
         ----------

diff --git a/econml/dml/dml.py b/econml/dml/dml.py
@@ -646,7 +646,7 @@ def sensitivity_interval(self, alpha=0.05, c_y=0.05, c_t=0.05, rho=1., interval_
 
         Can only be calculated when Y and T are single arrays, and T is binary or continuous.
 
-        Based on `Chernozhukov et al. (2022) <https://www.nber.org/papers/w30302>`_
+        Based on [Chernozhukov2022]_
 
         Parameters
         ----------
@@ -690,7 +690,7 @@ def robustness_value(self, null_hypothesis=0, alpha=0.05, interval_type='ci'):
 
         Can only be calculated when Y and T are single arrays, and T is binary or continuous.
 
-        Based on `Chernozhukov et al. (2022) <https://www.nber.org/papers/w30302>`_
+        Based on [Chernozhukov2022]_
 
         Parameters
         ----------

diff --git a/econml/dr/_drlearner.py b/econml/dr/_drlearner.py
@@ -798,7 +798,7 @@ def sensitivity_interval(self, T, alpha=0.05, c_y=0.05, c_t=0.05, rho=1., interv
         The sensitivity interval is the range of values for the ATE that are
         consistent with the observed data, given a specified level of confounding.
 
-        Based on `Chernozhukov et al. (2022) <https://www.nber.org/papers/w30302>`_
+        Based on [Chernozhukov2022]_
 
         Parameters
         ----------
@@ -848,7 +848,7 @@ def robustness_value(self, T, null_hypothesis=0, alpha=0.05, interval_type='ci')
 
         Returns 0 if the original interval already includes the null_hypothesis.
 
-        Based on `Chernozhukov et al. (2022) <https://www.nber.org/papers/w30302>`_
+        Based on [Chernozhukov2022]_
 
         Parameters
         ----------