Skip to content

Orthogonal/Double ML: Bayesian regression to estimate the treatment effect from residuals? #282

@ghost

Description

Hello,

As noted in the EconML documentation of Orthogonal/Double ML, this method does the following steps and finally regress #1's residuals ~ #2's residuals:

#1. predicting the outcome from the controls,
#2. predicting the treatment from the controls;

As the same documentation says, "The approach allows for arbitrary Machine Learning algorithms to be used for the two predictive tasks, while maintaining many favorable statistical properties related to the final model (e.g. small mean squared error, asymptotic normality, construction of confidence intervals).".

"The main advantage of DML is that if one makes parametric assumptions on 𝜃(𝑋), then one achieves fast estimation rates and, for many cases of final stage estimators, also asymptotic normality on the second stage estimate 𝜃̂ , even if the first stage estimates on 𝑞(𝑋,𝑊) and 𝑓(𝑋,𝑊) are only 𝑛1/4 consistent, in terms of RMSE. For this theorem to hold, the nuisance estimates need to be fitted in a cross-fitting manner (see _OrthoLearner). The latter robustness property follows from the fact that the moment equations that correspond to the final least squares estimation (i.e. the gradient of the squared loss), satisfy a Neyman orthogonality condition with respect to the nuisance parameters 𝑞,𝑓. For a more detailed exposition of how Neyman orthogonality leads to robustness we refer the reader to [Chernozhukov2016], [Mackey2017], [Nie2017], [Chernozhukov2017], [Chernozhukov2018], [Foster2019]."

In the "Class Hierarchy Structure" section, the documentation very nicely introduced "DMLCateEstimator". "DMLCateEstimator assumes that the effect model for each outcome 𝑖 and treatment 𝑗 is linear, i.e. takes the form 𝜃𝑖𝑗(𝑋)=⟨𝜃𝑖𝑗,𝜙(𝑋)⟩, and allows for any arbitrary scikit-learn linear estimator to be defined as the final stage (e.g. ElasticNet, Lasso, LinearRegression and their multi-task variations in the case where we have mulitple outcomes, i.e. 𝑌 is a vector)." I noticed that sklearn.linear_model contains Bayesian regression, such as the Bayesian Ridge regression.

Before calling sklearn.linear_model.BayesianRidge() as our last-stage residual regressor, just want to make sure it's statistically solid to do that without violating any causal estimation assumption in Double ML framework?
Say, our goal is to estimate the treatment effect. In the past, we have a few historical randomized test results that measured such treatment effect. So we want to use those randomized test results as the uninformative priors in sklearn.linear_model.BayesianRidge() in the last-stage residual regression, so that we can interpret the posterior result from this last-stage residual regression, as the treatment effect that we derive from our prior knowledge enhanced by observed data?

It's a little bit difficult for me to locate theoretical or academic reference to estimate treatment effect in the Double ML framework with Bayesian approach. So it would be really nice if you by any chance can help point to any of such reference?

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions