-
Notifications
You must be signed in to change notification settings - Fork 285
implement the data branch #22636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
implement the data branch #22636
Conversation
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
PR Code Suggestions ✨Explore these optional code suggestions:
|
User description
What type of PR is this?
Which issue(s) this PR fixes:
issue ##21979
What this PR does / why we need it:
Implement the data branch.
PR Type
Enhancement, Feature, Tests
Description
Implements comprehensive data branch functionality for snapshot-based table versioning and management
Adds core data branch operations: diff detection between snapshots, merge with conflict resolution (FAIL, SKIP, ACCEPT), and branch hierarchy management
Implements adaptive hashmap with disk spillover support for efficient change tracking and storage
Adds branch metadata system table (
mo_branch_metadata
) to track clone operations and branch relationshipsIntroduces DAG (Directed Acyclic Graph) structure with LCA (Lowest Common Ancestor) algorithm for branch hierarchy
Extends SQL grammar and parser to support DATA BRANCH statements (CREATE TABLE, CREATE DATABASE, DELETE TABLE, DELETE DATABASE, DIFF, MERGE)
Refactors clone operations to track branch metadata and support data branch workflows
Adds comprehensive test coverage for diff, merge, and conflict handling scenarios with various data types and dataset sizes
Updates privilege logic to support data branch operations with appropriate access controls
Standardizes keyword capitalization throughout codebase (Exists vs exists) for consistency
Adds support for commit timestamp column tracking in distributed TAE engine
Extends type system with generic value comparison function and decimal/timestamp support in result sets
Diagram Walkthrough
File Walkthrough
7 files
data_branch.go
Implement data branch diff and merge operations
pkg/frontend/data_branch.go
merge operations between table snapshots
constructing change handles, and managing branch metadata
ACCEPT)
branch DAG construction
branch_hashmap.go
Implement adaptive hashmap with disk spillover support
pkg/frontend/databranchutils/branch_hashmap.go
memory allocation fails
records with key-value semantics
management
resource cleanup
data_branch.go
Add data branch statement type definitions and implementations
pkg/sql/parsers/tree/data_branch.go
DataBranchCreateTable
,DataBranchDeleteTable
,DataBranchCreateDatabase
,DataBranchDeleteDatabase
DataBranchDiff
andDataBranchMerge
statement types for branchoperations
using reuse pattern
branch_dag.go
Implement data branch DAG with LCA algorithm
pkg/frontend/databranchutils/branch_dag.go
DataBranchDAG
for managing branch hierarchyrelationships
parent branches
branch_change_handle.go
Add branch change handle for data collection and filtering
pkg/frontend/databranchutils/branch_change_handle.go
BranchChangeHandle
wrapper for engine changeshandling
CollectChanges
function for gathering changes betweentimestamps
self_handle.go
Add data branch statement execution handler
pkg/frontend/self_handle.go
CreateTable, DeleteTable, DeleteDatabase, CreateDatabase)
handleDataBranch
function call with proper frontend printtracking
keywords.go
Add data branch and conflict handling keywords
pkg/sql/parsers/dialect/mysql/keywords.go
branch
,diff
,conflict
fail
,skip
,accept
15 files
pitr.go
Standardize capitalization of "Exists" in PITR operations
pkg/frontend/pitr.go
Exists
instead of lowercase
exists
doCreatePitr
,doDropPitr
,doAlterPitr
,doRestorePitr
(Point-In-Time Recovery) operations
publication_subscription_test.go
Standardize capitalization in publication subscription tests
pkg/frontend/publication_subscription_test.go
Exists
instead of lowercaseexists
Test_doAlterPublication
,Test_doAlterPublication2
, andTest_doDropPublication
snapshot_restore_with_ts.go
Standardize capitalization of "Exists" in snapshot restore operations
pkg/frontend/snapshot_restore_with_ts.go
Exists
instead of lowercase
exists
restoreToAccountFromTS
,recreateTableFromTS
,and
restoreViewsFromTS
snapshot.go
Standardize capitalization in snapshot rebuild logging
pkg/vm/engine/tae/logtail/snapshot.go
RebuildAObjectDel
function to use lowercaseexists
instead of capitalizedExists
snapshot.go
Standardize keyword capitalization in snapshot operations
pkg/frontend/snapshot.go
Exists
keyword
Exists
instead ofexists
authenticate_test.go
Update test descriptions with standardized keyword capitalization
pkg/frontend/authenticate_test.go
Exists
keyword
session.go
Standardize keyword capitalization in session authentication
pkg/frontend/session.go
Exists
keywordsteps
query_result.go
Standardize keyword capitalization in query result handling
pkg/frontend/query_result.go
Exists
keyword for consistencysystem_initialize.go
Standardize keyword capitalization in system initialization
pkg/frontend/system_initialize.go
Exists
keyword for consistencymysql_protocol_predefines.go
Standardize keyword capitalization in protocol definitions
pkg/frontend/mysql_protocol_predefines.go
Exists
keyword for consistencypublication_subscription.go
Standardize keyword capitalization in publication subscription
pkg/frontend/publication_subscription.go
Exists
keyword forconsistency
cdc_options.go
Standardize keyword capitalization in CDC options
pkg/frontend/cdc_options.go
Exists
keyword for consistencytxn.go
Fix comment capitalization in TxnHandler
pkg/frontend/txn.go
always"
util.go
Fix comment capitalization in utility functions
pkg/frontend/util.go
Exists" and "true/false - exists" to "true/false - Exists"
pitr_test.go
Fix test case name capitalization
pkg/frontend/pitr_test.go
"SubscriptionOption Exists"
14 files
authenticate.go
Add branch metadata table support and update privilege logic
pkg/frontend/authenticate.go
catalog.MO_BRANCH_METADATA
to system account tables andprivilege maps
Exists
keyword forconsistency
MoCatalogBranchMetadataDDL
to catalog initialization(CreateTable, DeleteTable, Merge, Diff)
PrivilegeTypeTableAll
instead ofPrivilegeTypeInsert
clone.go
Implement data branch metadata tracking for clone operations
pkg/frontend/clone.go
cloneReceipt
structgetOpAndToAccountId
to useToAccountOpt
instead of stringparameter
resolveSnapshot
helper function for timestamp resolutionupdateBranchMetaTable
to record clone operations in branchmetadata table
database context
encoding.go
Add generic value comparison function for all types
pkg/container/types/encoding.go
CompareValues
function to compare values of different typesand special types
build_alter_table.go
Improve variable naming for fake primary key tracking
pkg/sql/plan/build_alter_table.go
hasFakePK
tocopyFakePKCol
for better clarityduring alter table copy operations
local_disttae_datasource.go
Add commit timestamp column support to disttae datasource
pkg/vm/engine/disttae/local_disttae_datasource.go
DefaultCommitTS_Attr
column in filtered in-memorycommitted inserts
when present
ddl.go
Mark deleted tables in branch metadata during drop operations
pkg/sql/compile/ddl.go
table_deleted
flag inmo_branch_metadata
table
reader.go
Add commit timestamp attribute support to reader utility
pkg/vm/engine/readutil/reader.go
DefaultCommitTS_Attr
column in column update logictype
clone.go
Refactor clone statement to use ToAccountOpt structure
pkg/sql/parsers/tree/clone.go
ToAccountOpt
struct to encapsulate account optioninformation
CloneTable
andCloneDatabase
to useToAccountOpt
instead ofdirect identifier
operations
stmt_kind.go
Enable data branch statements in uncommitted transactions
pkg/frontend/stmt_kind.go
statementCanBeExecutedInUncommittedTransaction
functionDataBranchDeleteTable, and DataBranchDeleteDatabase to execute in
uncommitted transactions
types.go
Add data branch frontend print type and standardize keywords
pkg/frontend/types.go
FPDataBranch
constant to frontend print typesExists
keywordresultset.go
Add decimal and timestamp type support to result set
pkg/frontend/resultset.go
types in
GetString
methodtype code
func_mo.go
Register MO_BRANCH_METADATA in function catalog
pkg/sql/plan/function/func_mo.go
catalog.MO_BRANCH_METADATA
constant to the function registrywith value 0
types.go
Define MO_BRANCH_METADATA system table constant
pkg/catalog/types.go
MO_BRANCH_METADATA = "mo_branch_metadata"
to systemtable definitions
mysql_sql.y
Implement data branch SQL grammar and parsing
pkg/sql/parsers/dialect/mysql/mysql_sql.y
toAccountOpt
,conflictOpt
, anddiffAsOpt
forbranch operations
BRANCH
,LOG
,REVERT
,REBASE
,DIFF
,CONFLICT
,CONFLICT_FAIL
,CONFLICT_SKIP
,CONFLICT_ACCEPT
branch_stmt
grammar rule supporting DATA BRANCH operations(CREATE TABLE, CREATE DATABASE, DELETE TABLE, DELETE DATABASE, DIFF,
MERGE)
to_account_opt
,conflict_opt
,diff_as_opt
foroptional parameters
create_table_stmt
to useto_account_opt
instead of hardcodedTO ACCOUNT syntax
DATA
andBRANCH
to algorithm_type alternatives23 files
branch_hashmap_test.go
Add comprehensive tests for branch hashmap implementation
pkg/frontend/databranchutils/branch_hashmap_test.go
BranchHashmap
functionality
disk, and partial deletion
allocator for testing spill behavior
branch_dag_test.go
Add tests for data branch DAG and LCA algorithm
pkg/frontend/databranchutils/branch_dag_test.go
supporting branch hierarchy
algorithm
diff_3.result
Add data branch diff test results for mixed types
test/distributed/cases/snapshot/branch/diff/diff_3.result
mixed data types
types, string primary keys with decimal/binary columns, and temporal
primary keys with float/text updates
structures
diff_3.sql
Add data branch diff test cases for mixed types
test/distributed/cases/snapshot/branch/diff/diff_3.sql
types
binary data, and sensor events with temporal keys
merge_1.result
Add data branch merge test results
test/distributed/cases/snapshot/branch/merge/merge_1.result
with updates, and merges with cloned tables
diff_1.result
Add data branch diff test results with snapshots
test/distributed/cases/snapshot/branch/diff/diff_1.result
same/different branch times, and LCA relationships
configurations
diff_2.result
Add data branch diff test results for large datasets
test/distributed/cases/snapshot/branch/diff/diff_2.result
and complex primary keys
key handling
diff_1.sql
Add data branch diff test cases with snapshots
test/distributed/cases/snapshot/branch/diff/diff_1.sql
Ancestor) scenarios
LCA relationships
patterns
merge_1.sql
Add data branch merge test cases
test/distributed/cases/snapshot/branch/merge/merge_1.sql
tables
diff_2.sql
Add data branch diff test cases for large datasets
test/distributed/cases/snapshot/branch/diff/diff_2.sql
scenarios
composite primary keys
merge_2.result
Add data branch merge conflict handling test results
test/distributed/cases/snapshot/branch/merge/merge_2.result
CONFLICT_ACCEPT options
merge_2.sql
Add data branch merge conflict handling test cases
test/distributed/cases/snapshot/branch/merge/merge_2.sql
(skip, accept)
conflicting changes
tenant.result
Update tenant test results for branch metadata table
test/distributed/cases/tenant/tenant.result
relname not like '__mo_index_unique__%'
torelname not like '__mo_index%'
mo_branch_metadata
in system tablelistings
starlark.result
Update starlark procedure test results with escaped keywords
test/distributed/cases/procedure/starlark.result
keywords:
mo
andmo.foo
sp_table.result
Update sp_table test results for branch metadata
test/distributed/cases/dml/select/sp_table.result
relname not like '__mo_index_unique__%'
torelname not like '__mo_index%'
mo_branch_metadata
to expected system table resultssp_table.sql
Update sp_table query filter for index matching
test/distributed/cases/dml/select/sp_table.sql
relname not like '__mo_index_unique__%'
torelname not like '__mo_index%'
tenant.test
Update tenant test query filter for index matching
test/distributed/cases/tenant/tenant.test
relname not like '__mo_index_unique__%'
torelname not like '__mo_index%'
starlark.sql
Update starlark procedure definitions with escaped keywords
test/distributed/cases/procedure/starlark.sql
keywords:
mo
andmo.foo
restore_cluster_table.result
Update cluster restore test results for branch metadata
test/distributed/cases/snapshot/cluster/restore_cluster_table.result
mo_branch_metadata
to expected system table listings in twolocations
table
cluster_level_snapshot_restore_cluster.result
Update cluster snapshot restore test results for branch metadata
test/distributed/cases/snapshot/cluster_level_snapshot_restore_cluster.result
mo_branch_metadata
to expected system table listings in twolocations
table
clone_sys_db_table_to_new_db_table.result
Update clone system table test results for branch metadata
test/distributed/cases/snapshot/clone/clone_sys_db_table_to_new_db_table.result
mo_branch_metadata
to expected system table listings in twolocations
show.result
Update show tables test results for branch metadata
test/distributed/cases/dml/show/show.result
mo_branch_metadata
to expected system table listingstable
create_user_default_role.result
Update privilege test results for branch metadata
test/distributed/cases/tenant/privilege/create_user_default_role.result
mo_branch_metadata
to expected system table listingstable
4 files
cluster_upgrade_list.go
Add branch metadata table to v4.0.0 cluster upgrade
pkg/bootstrap/versions/v4_0_0/cluster_upgrade_list.go
mo_branch_metadata
tablecluster_upgrade_list.go
Add branch metadata table to v3.0.0 cluster upgrade
pkg/bootstrap/versions/v3_0_0/cluster_upgrade_list.go
mo_branch_metadata
tableprocess
predefined.go
Add branch metadata table DDL definition
pkg/frontend/predefined.go
MoCatalogBranchMetadataDDL
constant defining the branch metadatatable schema
level, and table_deleted flag
lib.go
Add math library linking flag to CGO
cgo/lib.go
#cgo LDFLAGS: -lm
compiler flag for linking the math library1 files