Skip to content

[improve][io] support array schema for JDBC postgres connector #24549

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

freeznet
Copy link
Contributor

Fixes #xyz

Main Issue: #xyz

PIP: #xyz

Motivation

Currently, the JDBC PostgreSQL connector in Apache Pulsar doesn't support array schema types, limiting its ability to handle complex data structures. Many real-world applications require storing array data in PostgreSQL databases (e.g., tags, categories, multi-valued attributes).

Without this support, users have to:

  • Manually serialize arrays into strings or JSON
  • Use workarounds that don't leverage PostgreSQL's native array capabilities
  • Lose type safety and query performance benefits of native array columns

This PR addresses these limitations by adding comprehensive array schema support for the PostgreSQL JDBC connector.

Modifications

  1. Core PostgreSQL Array Implementation (PostgresJdbcAutoSchemaSink.java):

    • Added array type conversion logic for all common PostgreSQL array types (INTEGER[], TEXT[], BOOLEAN[], NUMERIC[], REAL[], BIGINT[], etc.)
    • Implemented proper validation and error handling for array elements
    • Added type mapping between Avro array schemas and PostgreSQL array types
    • Ensures type consistency across all array elements
  2. Framework Enhancement (BaseJdbcAutoSchemaSink.java):

    • Extended the base JDBC sink to support array handling
    • Added abstract method handleArrayValue for database-specific implementations
    • Maintained strict backward compatibility - existing functionality remains unchanged
  3. Database-Specific Support (SqliteJdbcAutoSchemaSink.java):

    • Added explicit error messaging for SQLite to indicate array non-support
    • Provides clear guidance to users when attempting to use arrays with unsupported databases
  4. Testing Infrastructure:

    • Created comprehensive test utilities (PostgresArrayTestUtils.java) for generating test array data
    • Added full integration test suite (PostgresJdbcArrayIntegrationTest.java) covering:
      • All supported array types
      • Empty arrays
      • Arrays with null elements
      • Boundary value testing
      • End-to-end data flow validation

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc

  • doc-required

  • Documentation should be updated to include:

    • Supported array types for PostgreSQL JDBC connector
    • Configuration examples for array schemas
    • Limitations and best practices
  • doc-not-needed

  • doc-complete

Matching PR in forked repository

PR in forked repository: freeznet#13

@github-actions github-actions bot added the doc-required Your PR changes impact docs and you will update later. label Jul 23, 2025
@freeznet freeznet self-assigned this Jul 23, 2025
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good work @freeznet

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 78.19549% with 29 lines in your changes missing coverage. Please review.

Project coverage is 74.30%. Comparing base (bbc6224) to head (9fe8bfe).
Report is 1225 commits behind head on master.

Files with missing lines Patch % Lines
...che/pulsar/io/jdbc/PostgresJdbcAutoSchemaSink.java 79.62% 16 Missing and 6 partials ⚠️
.../apache/pulsar/io/jdbc/BaseJdbcAutoSchemaSink.java 85.71% 2 Missing and 1 partial ⚠️
...e/pulsar/io/jdbc/ClickHouseJdbcAutoSchemaSink.java 0.00% 1 Missing ⚠️
...ache/pulsar/io/jdbc/MariadbJdbcAutoSchemaSink.java 0.00% 1 Missing ⚠️
...che/pulsar/io/jdbc/OpenMLDBJdbcAutoSchemaSink.java 0.00% 1 Missing ⚠️
...pache/pulsar/io/jdbc/SqliteJdbcAutoSchemaSink.java 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #24549      +/-   ##
============================================
+ Coverage     73.57%   74.30%   +0.73%     
- Complexity    32624    32630       +6     
============================================
  Files          1877     1874       -3     
  Lines        139502   146362    +6860     
  Branches      15299    16796    +1497     
============================================
+ Hits         102638   108754    +6116     
- Misses        28908    28967      +59     
- Partials       7956     8641     +685     
Flag Coverage Δ
inttests 26.64% <0.00%> (+2.05%) ⬆️
systests 23.32% <0.00%> (-1.00%) ⬇️
unittests 73.80% <80.00%> (+0.95%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...e/pulsar/io/jdbc/ClickHouseJdbcAutoSchemaSink.java 0.00% <0.00%> (ø)
...ache/pulsar/io/jdbc/MariadbJdbcAutoSchemaSink.java 0.00% <0.00%> (ø)
...che/pulsar/io/jdbc/OpenMLDBJdbcAutoSchemaSink.java 0.00% <0.00%> (ø)
...pache/pulsar/io/jdbc/SqliteJdbcAutoSchemaSink.java 76.92% <0.00%> (-6.42%) ⬇️
.../apache/pulsar/io/jdbc/BaseJdbcAutoSchemaSink.java 75.75% <85.71%> (+2.90%) ⬆️
...che/pulsar/io/jdbc/PostgresJdbcAutoSchemaSink.java 72.50% <79.62%> (+72.50%) ⬆️

... and 1106 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@codelipenghui
Copy link
Contributor

/pulsarbot run-failure-checks

1 similar comment
@freeznet
Copy link
Contributor Author

/pulsarbot run-failure-checks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-required Your PR changes impact docs and you will update later. ready-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants