Skip to content

As Glue limits comments to 255 characters, we may need to truncate them #174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 23, 2025

Conversation

manabery
Copy link
Contributor

This is a PR to merge the original pull request (#38).

Changes added in #38:

  • When importing Hive Metastore to Glue Data Catalog, truncate table column's comments to 255 characters to meet Glue limitation.

Changes added in this PR:

Tests:

  • Tests are written in this comment

  • Additional test for partition key:

Steps:

  1. Create partitioned table with comments on Hive 3.1.3 Metastore
hive> CREATE TABLE
    > comment_test_partitioned (
    >     long_comment int COMMENT "Set up an AWS Glue ETL job which extracts metadata from your Hive metastore (MySQL) and loads it into your AWS Glue Data Catalog. This method requires an AWS Glue connection to the Hive metastore as a JDBC source. An ETL script is provided to extract metadata from the Hive metastore and write it to AWS Glue Data Catalog.",
    >     short_comment int COMMENT "aws-glue-samples",
    >     none_comment int
    > )
    > PARTITIONED BY (
    >     pk int COMMENT "Set up an AWS Glue ETL job which extracts metadata from your Hive metastore (MySQL) and loads it into your AWS Glue Data Catalog. This method requires an AWS Glue connection to the Hive metastore as a JDBC source. An ETL script is provided to extract metadata from the Hive metastore and write it to AWS Glue Data Catalog."
    > );
  1. Run the job with from-jdbc mode and import the table into Glue Data Catalog.

Results:
With the change in #38, the job failed with this message. (failed at table.partitionKeys.1.member.comment)

An error occurred while calling o1954.pyWriteDynamicFrame. 1 validation error detected: Value 'Set up an AWS Glue ETL job which extracts metadata from your Hive metastore (MySQL) and loads it into your AWS Glue Data Catalog. This method requires an AWS Glue connection to the Hive metastore as a JDBC source. An ETL script is provided to extract metadata from the Hive metastore and write it to AWS Glue Data Catalog.' at 'table.partitionKeys.1.member.comment' failed to satisfy constraint: Member must have length less than or equal to 255 (Service: Glue, Status Code: 400, Request ID:

With this change, the job succeeded and the comment on partition key is also truncated.

@moomindani moomindani merged commit 5b61691 into aws-samples:master May 23, 2025
@moomindani
Copy link
Contributor

Thank you for your contribution!

@manabery manabery deleted the pr_38_merge branch May 23, 2025 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants