Skip to content

add Validation docs #975

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions doc/spec/references.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,12 @@ References
Two-Stage Estimation with a High-Dimensional Second Stage.
2018.

.. [Chernozhukov2022]
V. Chernozhukov, C. Cinelli, N. Kallus, W. Newey, A. Sharma, and V. Syrgkanis.
Long Story Short: Omitted Variable Bias in Causal Machine Learning.
*NBER Working Paper No. 30302*, 2022.
URL https://www.nber.org/papers/w30302.

.. [Hartford2017]
Jason Hartford, Greg Lewis, Kevin Leyton-Brown, and Matt Taddy.
Deep IV: A flexible approach for counterfactual prediction.
Expand Down
1 change: 1 addition & 0 deletions doc/spec/spec.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ EconML User Guide
estimation_dynamic
inference
model_selection
validation
interpretability
federated_learning
references
Expand Down
68 changes: 68 additions & 0 deletions doc/spec/validation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
Validation
======================

Validating causal estimates is inherently challenging, as the true counterfactual outcome for a given treatment is
unobservable. However, there are several checks and tools available in EconML to help assess the credibility of causal
estimates.


Sensitivity Analysis
---------------------

For many EconML estimators, unobserved confounding can lead to biased causal estimates.
Moreover, it is impossible to prove the absence of unobserved confounders.
This is a fundamental problem for observational causal inference.

To mitigate this problem, EconML provides a suite of sensitivity analysis tools,
based on [Chernozhukov2022]_,
to assess the robustness of causal estimates to unobserved confounding.

Specifically, select estimators (subclasses of :class:`.DML` and :class:`.DRLearner`)
have access to ``sensitivity_analysis``, ``robustness_value``, and ``sensitivity_summary`` methods.

``sensitivity_analysis`` provides an updated confidence interval for the ATE based on a specified level of unobserved confounding.


``robustness_value`` computes the minimum level of unobserved confounding required
so that confidence intervals around the ATE would begin to include the given point (0 by default).


``sensitivity_summary`` provides a summary of the the two above methods.

DRTester
----------------

EconML provides the :class:`.DRTester` class, which implements Best Linear Predictor (BLP), calibration r-squared,
and uplift modeling methods for validation.

See an example notebook `here <https://github.com/py-why/EconML/blob/main/notebooks/CATE%20validation.ipynb>`__.

Scoring
-------

Many EconML estimators implement a ``.score`` method to evaluate the goodness-of-fit of the final model. While it may be
difficult to make direct sense of results from ``.score``, EconML offers the :class:`RScorer` class to facilitate model
selection based on scoring.

:class:`RScorer` enables comparison and selection among different causal models.

See an example notebook `here
<https://github.com/py-why/EconML/blob/main/notebooks/Causal%20Model%20Selection%20with%20the%20RScorer.ipynb>`__.

Confidence Intervals and Inference
----------------------------------

Most EconML estimators allow for inference, including standard errors, confidence intervals, and p-values for
estimated effects. A common validation approach is to check whether the p-values are below a chosen significance level
(e.g., 0.05). If not, the null hypothesis that the causal effect is zero cannot be rejected.

**Note:** Inference results are only valid if the model specification is correct. For example, if a linear model is used
but the true data-generating process is nonlinear, the inference may not be reliable. It is generally not possible to
guarantee correct specification, so p-value inspection should be considered a surface-level check.

DoWhy Refutation Tests
----------------------

The DoWhy library, which complements EconML, includes several refutation tests for validating causal estimates. These
tests work by comparing the original causal estimate to estimates obtained from perturbed versions of the data, helping
to assess the robustness of causal conclusions.
4 changes: 2 additions & 2 deletions econml/dml/causal_forest.py
Original file line number Diff line number Diff line change
Expand Up @@ -857,7 +857,7 @@ def sensitivity_interval(self, alpha=0.05, c_y=0.05, c_t=0.05, rho=1., interval_

Can only be calculated when Y and T are single arrays, and T is binary or continuous.

Based on `Chernozhukov et al. (2022) <https://www.nber.org/papers/w30302>`_
Based on [Chernozhukov2022]_

Parameters
----------
Expand Down Expand Up @@ -901,7 +901,7 @@ def robustness_value(self, null_hypothesis=0, alpha=0.05, interval_type='ci'):

Can only be calculated when Y and T are single arrays, and T is binary or continuous.

Based on `Chernozhukov et al. (2022) <https://www.nber.org/papers/w30302>`_
Based on [Chernozhukov2022]_

Parameters
----------
Expand Down
4 changes: 2 additions & 2 deletions econml/dml/dml.py
Original file line number Diff line number Diff line change
Expand Up @@ -646,7 +646,7 @@ def sensitivity_interval(self, alpha=0.05, c_y=0.05, c_t=0.05, rho=1., interval_

Can only be calculated when Y and T are single arrays, and T is binary or continuous.

Based on `Chernozhukov et al. (2022) <https://www.nber.org/papers/w30302>`_
Based on [Chernozhukov2022]_

Parameters
----------
Expand Down Expand Up @@ -690,7 +690,7 @@ def robustness_value(self, null_hypothesis=0, alpha=0.05, interval_type='ci'):

Can only be calculated when Y and T are single arrays, and T is binary or continuous.

Based on `Chernozhukov et al. (2022) <https://www.nber.org/papers/w30302>`_
Based on [Chernozhukov2022]_

Parameters
----------
Expand Down
4 changes: 2 additions & 2 deletions econml/dr/_drlearner.py
Original file line number Diff line number Diff line change
Expand Up @@ -798,7 +798,7 @@ def sensitivity_interval(self, T, alpha=0.05, c_y=0.05, c_t=0.05, rho=1., interv
The sensitivity interval is the range of values for the ATE that are
consistent with the observed data, given a specified level of confounding.

Based on `Chernozhukov et al. (2022) <https://www.nber.org/papers/w30302>`_
Based on [Chernozhukov2022]_

Parameters
----------
Expand Down Expand Up @@ -848,7 +848,7 @@ def robustness_value(self, T, null_hypothesis=0, alpha=0.05, interval_type='ci')

Returns 0 if the original interval already includes the null_hypothesis.

Based on `Chernozhukov et al. (2022) <https://www.nber.org/papers/w30302>`_
Based on [Chernozhukov2022]_

Parameters
----------
Expand Down