Skip to content

feat: rewrite subquery into dependent join logical plan #16016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 82 commits into
base: main
Choose a base branch
from

Conversation

duongcongtoai
Copy link
Contributor

@duongcongtoai duongcongtoai commented May 10, 2025

Which issue does this PR close?

This PR is a part of a long story for a general purpose subquery decorrelation, after some discussion in #5492, this PR proposes adding the followings:

  • an optimizor that convert all the subqueries into dependent join logical plan (this is only a temporary plan) <---- This is what this PR trying to achieve
  • an optimizor to decorrelate dependent join logical plan, POC is in this working branch

To avoid breaking existing tests and smoother collaboration, the changes should happen in the following sequence

1. Merge item 1 without integrate the new optimizor to the main flow (behavior is tested in-code instead of sqllogictests)
2. Start implement more rewriting rules for different query plan (aggregate, projection, filter ...) using the a new optimizer
We keep the working of this new optimizor in a working branch, and if the implementation can fully support existing subqueries,
we make the following change to the main branch

From

            Arc::new(DecorrelatePredicateSubquery::new()),
            Arc::new(ScalarSubqueryToJoin::new()),
            Arc::new(DecorrelateLateralJoin::new()),

Into

            Arc::new(DependentJoinRewriter::new()),
            Arc::new(DependentJoinDecorrelator::new()),

Or we can even combine the 2 optimizors into one, into

            Arc::new(DependentJoinDecorrelator::new()),

The following works are needed after merging this PR

  • Implement DelimGet logical plan and physical plan
  • Implement DelimGetRemove optimizor similar to DuckDB
  • Implement JoinType::Single similar to duckdb. This operator is needed to support this type of query
select * from outer where outer.a = (select inner.b from inner where inner.c=outer.c)

Rationale for this change

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels May 10, 2025
@duongcongtoai duongcongtoai changed the title refactor: framework for subquery unnesting [WIP] refactor: framework for subquery decorrelation May 10, 2025
@alamb
Copy link
Contributor

alamb commented May 12, 2025

FYI @irenjj

@xudong963 xudong963 self-requested a review May 14, 2025 14:49
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jun 7, 2025
// column
let alias = match e {
Expr::InSubquery(_) | Expr::Exists(_) | Expr::ScalarSubquery(_) => {
subquery_alias_by_offset.get(offset_ref).unwrap()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unwrap() is not allowed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved, convert into error in case of unwrap fail

Comment on lines 151 to 162
fn rewrite_projection(
&mut self,
original_proj: &Projection,
dependent_join_node: &Node,
current_subquery_depth: usize,
mut current_plan: LogicalPlanBuilder,
subquery_alias_by_offset: HashMap<usize, String>,
) -> Result<LogicalPlanBuilder> {
// everytime we meet a subquery during traversal, we increment this by 1
// we can use this offset to lookup the original subquery info
// in subquery_alias_by_offset
// the reason why we cannot create a hashmap keyed by Subquery object HashMap<Subquery,String>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the shared parts among rewrite_* functions can be extracted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored, so far i can see filter, projection and agg can be rewritten using one shared function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules proto Related to proto crate sql SQL Planner sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants