Use new API to do Iceberg partition. #13688

res-life · 2025-10-29T07:26:09Z

Fixes #13679.

Use new API to do Iceberg partition.

Signed-off-by: Chong Gao [email protected]

Depends on

[Java] Supports value indices for contiguousSplitGroupsAndGenUniqKeys rapidsai/cudf#20391

Checklists

This PR has added documentation for new or modified features or behaviors.
This PR has added new tests or modified existing tests to cover new code paths.
Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description.

Signed-off-by: Chong Gao <[email protected]>

Copilot

Pull Request Overview

This PR refactors the Iceberg partition logic to use a new API for improved performance and simplicity. The changes replace the previous sorting-based approach with a new contiguousSplitGroupsAndGenUniqKeys method that handles grouping and splitting in a single operation.

Simplifies partitioning logic by eliminating manual sorting and upper bound operations
Uses new CUDF API for more efficient group-by and splitting operations
Adds proper resource management for the input columnar batch in tests

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
GpuIcebergPartitioner.scala	Refactors partitioning implementation to use new `contiguousSplitGroupsAndGenUniqKeys` API
GpuIcebergPartitionerSuite.scala	Adds proper cleanup of input columnar batch after partitioning

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-10-29T09:26:19Z

iceberg/src/main/scala/com/nvidia/spark/rapids/iceberg/GpuIcebergPartitioner.scala

+  private def makeValueIndices(spillableInput: SpillableColumnarBatch): Array[Int] = {
+    val numInputCols = spillableInput.getColumnarBatch().numCols()


This method calls spillableInput.getColumnarBatch() which may trigger spilling/unspilling. Consider caching the result or passing numInputCols as a parameter to avoid potential repeated operations.

Suggested change

private def makeValueIndices(spillableInput: SpillableColumnarBatch): Array[Int] = {

val numInputCols = spillableInput.getColumnarBatch().numCols()

private def makeValueIndices(numInputCols: Int): Array[Int] = {

greptile-apps

Greptile Overview

Greptile Summary

This PR refactors the Iceberg partitioning implementation to use a new cuDF API (contiguousSplitGroupsAndGenUniqKeys) that consolidates multi-step partitioning logic into a single native operation. Previously, the partitioner performed separate operations for computing keys, generating row indices, sorting multiple times, grouping, and splitting. The new approach combines key computation with value columns and delegates the entire partition operation to cuDF, eliminating multiple GPU round-trips. Three new helper methods (computeKeysTable, makeKeysValuesTable, makeValueIndices) prepare the combined keys+values table structure required by the new API. A corresponding test fix adds proper resource cleanup (originalCb.close()) to prevent GPU memory leaks. This change fits into the broader Iceberg integration module (iceberg/) which provides GPU-accelerated Apache Iceberg support within the RAPIDS Accelerator for Spark.

Important Files Changed

Filename	Score	Overview
iceberg/src/main/scala/com/nvidia/spark/rapids/iceberg/GpuIcebergPartitioner.scala	3/5	Replaces multi-step partitioning algorithm with single cuDF API call; introduces three helper methods but contains potential resource management issues in nested withResource blocks
tests/src/test/spark350/scala/com/nvidia/spark/rapids/iceberg/GpuIcebergPartitionerSuite.scala	5/5	Adds single line to close ColumnarBatch after partitioning to prevent GPU memory leak

Confidence score: 3/5

This PR requires careful review due to resource management concerns in the core partitioning logic and dependency on external cuDF changes.
Score reflects potential resource leaks in nested withResource calls within helper methods (computeKeysTable, makeKeysValuesTable) where intermediate getColumnarBatch() calls could leak GPU memory if exceptions occur between resource acquisition and the outer withResource scope. The PR also depends on an external cuDF PR which may not yet be merged or tested in integration.
Pay close attention to iceberg/src/main/scala/com/nvidia/spark/rapids/iceberg/GpuIcebergPartitioner.scala, specifically lines 62-96 where nested resource management could fail under exception conditions, and verify that the cuDF dependency (rapidsai/cudf#20391) is properly integrated before merge.

Sequence Diagram

sequenceDiagram
    participant User
    participant GpuIcebergPartitioner
    participant SpillableColumnarBatch
    participant Table as "cuDF Table"
    participant GpuExpression
    participant GpuIcebergPartitionerObj as "GpuIcebergPartitioner Object"

    User->>GpuIcebergPartitioner: "partition(input: ColumnarBatch)"
    GpuIcebergPartitioner->>GpuIcebergPartitioner: "Check if numRows == 0"
    alt numRows == 0
        GpuIcebergPartitioner-->>User: "Return empty Seq"
    end
    
    GpuIcebergPartitioner->>SpillableColumnarBatch: "Create SpillableColumnarBatch from input"
    SpillableColumnarBatch-->>GpuIcebergPartitioner: "spillableInput"
    
    GpuIcebergPartitioner->>GpuIcebergPartitioner: "makeValueIndices(spillableInput)"
    GpuIcebergPartitioner->>GpuIcebergPartitioner: "makeKeysValuesTable(spillableInput)"
    
    GpuIcebergPartitioner->>GpuIcebergPartitioner: "computeKeysTable(spillableInput)"
    GpuIcebergPartitioner->>SpillableColumnarBatch: "getColumnarBatch()"
    SpillableColumnarBatch-->>GpuIcebergPartitioner: "inputBatch"
    
    loop For each partition expression
        GpuIcebergPartitioner->>GpuExpression: "columnarEval(inputBatch)"
        GpuExpression-->>GpuIcebergPartitioner: "Column result"
    end
    
    GpuIcebergPartitioner->>Table: "Create keysTable from evaluated columns"
    Table-->>GpuIcebergPartitioner: "keysTable"
    
    GpuIcebergPartitioner->>SpillableColumnarBatch: "getColumnarBatch()"
    SpillableColumnarBatch-->>GpuIcebergPartitioner: "inputBatch"
    GpuIcebergPartitioner->>Table: "Create inputTable from inputBatch"
    Table-->>GpuIcebergPartitioner: "inputTable"
    
    GpuIcebergPartitioner->>Table: "Combine keysTable and inputTable"
    Table-->>GpuIcebergPartitioner: "keysValuesTable"
    
    GpuIcebergPartitioner->>Table: "groupBy(keyColIndices).contiguousSplitGroupsAndGenUniqKeys(valueColumnIndices)"
    Table-->>GpuIcebergPartitioner: "splitRet (unique keys + partitioned groups)"
    
    GpuIcebergPartitioner->>GpuIcebergPartitionerObj: "toPartitionKeys(partitionType, partitionSparkType, uniqKeyTable)"
    GpuIcebergPartitionerObj-->>GpuIcebergPartitioner: "Array[SparkStructLike] partitionKeys"
    
    GpuIcebergPartitioner->>GpuIcebergPartitioner: "Get groups from splitRet"
    
    loop For each partition key and group
        GpuIcebergPartitioner->>SpillableColumnarBatch: "Create SpillableColumnarBatch from partition"
        SpillableColumnarBatch-->>GpuIcebergPartitioner: "spillablePartition"
        GpuIcebergPartitioner->>GpuIcebergPartitioner: "Create ColumnarBatchWithPartition(spillablePartition, partitionKey)"
    end
    
    GpuIcebergPartitioner-->>User: "Return Seq[ColumnarBatchWithPartition]"

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-10-29T09:27:04Z

iceberg/src/main/scala/com/nvidia/spark/rapids/iceberg/GpuIcebergPartitioner.scala

+  private def computeKeysTable(spillableInput: SpillableColumnarBatch): Table = {
+    val keyCols = withResource(spillableInput.getColumnarBatch()) { inputBatch =>
+      partitionExprs.safeMap(_.columnarEval(inputBatch))
+    }
+    withResource(keyCols) { _ =>
+      val arr = new Array[CudfColumnVector](partitionExprs.size)
+      for (i <- partitionExprs.indices) {
+        arr(i) = keyCols(i).getBase
+      }
+      new Table(arr:_*)
+    }
+  }


logic: calling getColumnarBatch() on line 63 without withResource wrapping it means the batch reference isn't auto-closed if exception occurs before line 66's withResource scope begins

greptile-apps · 2025-10-29T09:27:05Z

iceberg/src/main/scala/com/nvidia/spark/rapids/iceberg/GpuIcebergPartitioner.scala

+  private def makeValueIndices(spillableInput: SpillableColumnarBatch): Array[Int] = {
+    val numInputCols = spillableInput.getColumnarBatch().numCols()
+    (keyColNum until (keyColNum + numInputCols)).toArray
+  }


style: calling getColumnarBatch() here is not wrapped in resource management; if numCols() throws or if the reference isn't properly managed elsewhere, this could cause issues

Signed-off-by: Chong Gao <[email protected]>

greptile-apps

Greptile Overview

Greptile Summary

This review covers only the changes made since the last review, not the entire PR. The latest changes address the resource management concern raised in previous reviews by removing the problematic standalone call to getColumnarBatch() that was used solely to determine column count. The developer has refactored the value indices computation to occur at a different point in the code flow where resource management is already properly handled by existing withResource blocks, eliminating the potential for spilling/unspilling issues that were flagged earlier. This change completes the PR's goal of replacing the old multi-step Iceberg partitioning approach (which involved separate key computation, row index generation, multiple sorts, and gather/split operations) with a single native cuDF call to contiguousSplitGroupsAndGenUniqKeys, which should significantly reduce GPU memory allocations and improve performance.

Important Files Changed

Filename	Score	Overview
iceberg/src/main/scala/com/nvidia/spark/rapids/iceberg/GpuIcebergPartitioner.scala	4/5	Refactored to compute value column indices earlier in the flow, avoiding standalone `getColumnarBatch()` call that could trigger repeated spilling operations
tests/src/test/spark350/scala/com/nvidia/spark/rapids/iceberg/GpuIcebergPartitionerSuite.scala	4/5	No changes in this iteration; previous removal of `originalCb.close()` remains unchanged

Confidence score: 4/5

This PR is safe to merge with minimal remaining risk now that the resource management issue has been addressed
Score reflects that the latest changes directly address the caching concern from previous review, but one point deducted because the test file still has the unverified assumption that partitioner.partition() now handles input batch cleanup internally—this should be confirmed through code inspection or testing to ensure no GPU memory leaks occur
Pay close attention to iceberg/src/main/scala/com/nvidia/spark/rapids/iceberg/GpuIcebergPartitioner.scala to verify the new placement of valueColumnIndices computation is logically correct and that numCols() will always return the same value when called on the spillable batch at this point in execution

_{2 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Signed-off-by: Chong Gao <[email protected]>

greptile-apps

Greptile Overview

Greptile Summary

This review covers only the changes made since the last review, not the entire PR. The latest changes address previous feedback by moving the getColumnarBatch() call inside the withRetryNoSplit resource management block and passing numInputCols as a parameter to avoid repeated operations. The core refactoring now uses cuDF's new contiguousSplitGroupsAndGenUniqKeys API to consolidate the Iceberg partitioning workflow, which previously required separate sort, group-by, and split operations. This change eliminates intermediate GPU memory allocations for sorted tables and row indices, directly returning both the split groups and unique keys. The method makeValueIndices was renamed to makeInputIndices to better reflect that it references original input columns rather than intermediate values, and an explicit call to closeUniqKeyTable() was added after extracting partition keys to the host, reducing GPU memory pressure during subsequent processing. The refactoring fits into the broader GpuIcebergPartitioner component, which transforms Spark ColumnarBatch data into Iceberg-compatible partition splits by grouping rows according to partition key columns defined in the Iceberg table schema.

Important Files Changed

Filename	Score	Overview
iceberg/src/main/scala/com/nvidia/spark/rapids/iceberg/GpuIcebergPartitioner.scala	4/5	Refactored partition logic to use new cuDF API `contiguousSplitGroupsAndGenUniqKeys`, eliminated intermediate sort/row-index steps, renamed `makeValueIndices` to `makeInputIndices`, and added explicit unique key table closure for memory optimization

Confidence score: 4/5

This PR is safe to merge with low risk; the refactoring simplifies the code and improves resource management
Score reflects that the latest changes correctly address all previous resource management concerns by moving getColumnarBatch() inside withRetryNoSplit and passing numInputCols as a parameter, and the new API consolidates multiple operations safely within withResource blocks
No files require special attention; resource management is now properly scoped throughout the partition method

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-10-29T12:18:06Z

iceberg/src/main/scala/com/nvidia/spark/rapids/iceberg/GpuIcebergPartitioner.scala

-            val allCols = new Array[CudfColumnVector](totalColCount)
+    withRetryNoSplit(spillableInput) { scb =>
+      // make table: [key columns, input columns]
+      val keysAndInputTable = makeKeysAndInputTable(scb)


logic: the keysAndInputTable variable is assigned outside withResource but used inside the closure on line 115-116; if an exception occurs between line 110 and 114, the table won't be properly closed

res-life · 2025-10-30T01:24:56Z

iceberg/src/main/scala/com/nvidia/spark/rapids/iceberg/GpuIcebergPartitioner.scala

    }
-
-    val numRows = input.numRows()
+    val inputColumnIndices = makeInputIndices(input.numCols())


@liurenjie1024 unlike keyColIndices, we compute inputColumnIndices for each partition call, could we get input columns number via:
inputSchema.columns.size or dataSparkType.fields.size.
Is the following right?
input.numCols() == inputSchema.columns.size == dataSparkType.fields.size

Yes, it's correct.

liurenjie1024

LGTM! Just need to wait for jni part

liurenjie1024 · 2025-10-30T01:41:58Z

iceberg/src/main/scala/com/nvidia/spark/rapids/iceberg/GpuIcebergPartitioner.scala

    }
-
-    val numRows = input.numRows()
+    val inputColumnIndices = makeInputIndices(input.numCols())


Yes, it's correct.

Signed-off-by: Chong Gao <[email protected]>

greptile-apps

Greptile Overview

Greptile Summary

Refactors Iceberg partitioning to use new cuDF API contiguousSplitGroupsAndGenUniqKeys, simplifying the implementation by eliminating manual sorting, row index generation, and upper bound operations.

Major changes:

Replaced multi-step partitioning (sort, groupBy, upperBound, gather, split) with single contiguousSplitGroupsAndGenUniqKeys call
Introduced makeKeysAndInputTable helper to create combined key+input table
Removed dependencies on OrderByArg and Scalar
Simplified resource management flow in main partition method

Critical issue found:

makeKeysAndInputTable returns a Table with references to columns from already-closed source objects, creating dangling pointers to freed GPU memory

Confidence Score: 0/5

Critical memory management bug makes this PR unsafe to merge
The makeKeysAndInputTable method returns a Table with references to already-freed GPU columns, which will cause memory corruption, segfaults, or undefined behavior when the table is used
iceberg/src/main/scala/com/nvidia/spark/rapids/iceberg/GpuIcebergPartitioner.scala requires immediate attention - the makeKeysAndInputTable method must be fixed before merge

Important Files Changed

File Analysis

Filename	Score	Overview
iceberg/src/main/scala/com/nvidia/spark/rapids/iceberg/GpuIcebergPartitioner.scala	1/5	Refactored to use new API `contiguousSplitGroupsAndGenUniqKeys` but introduced critical memory management bug in `makeKeysAndInputTable` where returned Table references freed columns

Sequence Diagram

sequenceDiagram
    participant Client
    participant GpuIcebergPartitioner
    participant makeKeysAndInputTable
    participant cuDF
    participant splitRet

    Client->>GpuIcebergPartitioner: partition(input)
    GpuIcebergPartitioner->>GpuIcebergPartitioner: create SpillableColumnarBatch
    GpuIcebergPartitioner->>makeKeysAndInputTable: makeKeysAndInputTable(spillableInput)
    makeKeysAndInputTable->>makeKeysAndInputTable: compute key columns via partitionExprs
    makeKeysAndInputTable->>makeKeysAndInputTable: combine [key cols, input cols] into Table
    makeKeysAndInputTable-->>GpuIcebergPartitioner: keysAndInputTable
    GpuIcebergPartitioner->>cuDF: keysAndInputTable.groupBy(keyColIndices)
    cuDF->>cuDF: contiguousSplitGroupsAndGenUniqKeys(inputColumnIndices)
    cuDF-->>GpuIcebergPartitioner: splitRet
    GpuIcebergPartitioner->>splitRet: getUniqKeyTable()
    GpuIcebergPartitioner->>GpuIcebergPartitioner: toPartitionKeys(uniqKeyTable)
    GpuIcebergPartitioner->>splitRet: closeUniqKeyTable()
    GpuIcebergPartitioner->>splitRet: getGroups()
    GpuIcebergPartitioner->>GpuIcebergPartitioner: zip partitionKeys with groups
    GpuIcebergPartitioner-->>Client: Seq[ColumnarBatchWithPartition]

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-10-30T02:31:54Z

iceberg/src/main/scala/com/nvidia/spark/rapids/iceberg/GpuIcebergPartitioner.scala

+  private def makeKeysAndInputTable(spillableInput: SpillableColumnarBatch): Table = {
+    withResource(spillableInput.getColumnarBatch()) { inputBatch =>
+      // compute keys columns
+      val keyCols = partitionExprs.safeMap(_.columnarEval(inputBatch))
+
+      // combine keys columns and input columns into a new table
+      withResource(keyCols) { _ =>
+        withResource(GpuColumnVector.from(inputBatch)) { inputTable =>
+          val numCols = keyCols.size + inputTable.getNumberOfColumns
+          val cols = new Array[CudfColumnVector](numCols)
+          for (i <- keyCols.indices) {
+            cols(i) = keyCols(i).getBase
+          }
+          for (i <- 0 until inputTable.getNumberOfColumns) {
+            cols(i + keyCols.size) = inputTable.getColumn(i)
+          }
+          new Table(cols:_*)
+        }
+      }
+    }
+  }


logic: creating Table on line 83 with column references from keyCols and inputTable, but those source objects are closed by their withResource wrappers (lines 73, 74) before the table is returned. the returned table contains dangling references to freed GPU memory.

the columns need to be incRefCount() before being added to the new table, or the table construction needs to happen before the source resources are closed

Chong Gao added 3 commits October 29, 2025 11:58

Update Iceberg Partitioner

cae11f1

Signed-off-by: Chong Gao <[email protected]>

Update

7171910

Signed-off-by: Chong Gao <[email protected]>

Update comments

59e09cd

Signed-off-by: Chong Gao <[email protected]>

res-life self-assigned this Oct 29, 2025

Chong Gao added 3 commits October 29, 2025 15:34

Update

81fc103

Signed-off-by: Chong Gao <[email protected]>

Fix

75d397c

Signed-off-by: Chong Gao <[email protected]>

Fix memory leak in test code

92f9af5

Signed-off-by: Chong Gao <[email protected]>

res-life marked this pull request as ready for review October 29, 2025 09:25

res-life requested review from Copilot and liurenjie1024 October 29, 2025 09:25

Copilot AI reviewed Oct 29, 2025

View reviewed changes

greptile-apps bot reviewed Oct 29, 2025

View reviewed changes

Chong Gao added 3 commits October 29, 2025 18:24

Cache input columns number

f2352a3

Signed-off-by: Chong Gao <[email protected]>

Fix close too many times

6efe9c9

Signed-off-by: Chong Gao <[email protected]>

Update

1e2be3e

Signed-off-by: Chong Gao <[email protected]>

res-life requested a review from firestarman October 29, 2025 11:41

greptile-apps bot reviewed Oct 29, 2025

View reviewed changes

Chong Gao added 2 commits October 29, 2025 19:48

Refactor

8bc05d7

Signed-off-by: Chong Gao <[email protected]>

Refactor

d94d0c8

Signed-off-by: Chong Gao <[email protected]>

greptile-apps bot reviewed Oct 29, 2025

View reviewed changes

res-life commented Oct 30, 2025

View reviewed changes

liurenjie1024 reviewed Oct 30, 2025

View reviewed changes

Refactor

e4b2db6

Signed-off-by: Chong Gao <[email protected]>

greptile-apps bot reviewed Oct 30, 2025

View reviewed changes

		private def makeValueIndices(spillableInput: SpillableColumnarBatch): Array[Int] = {
		val numInputCols = spillableInput.getColumnarBatch().numCols()

	private def makeValueIndices(spillableInput: SpillableColumnarBatch): Array[Int] = {
	val numInputCols = spillableInput.getColumnarBatch().numCols()
	private def makeValueIndices(numInputCols: Int): Array[Int] = {

Use new API to do Iceberg partition. #13688

Are you sure you want to change the base?

Use new API to do Iceberg partition. #13688

Conversation

res-life commented Oct 29, 2025

Checklists

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Important Files Changed

Confidence score: 3/5

Sequence Diagram

Uh oh!

greptile-apps bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Important Files Changed

Confidence score: 4/5

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Important Files Changed

Confidence score: 4/5

Uh oh!

greptile-apps bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

res-life Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

liurenjie1024 Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

liurenjie1024 left a comment

Choose a reason for hiding this comment

Uh oh!

liurenjie1024 Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Confidence Score: 0/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants