-
Notifications
You must be signed in to change notification settings - Fork 828
As Glue limits comments to 255 characters, we may need to truncate them. #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
InvestigationMax length of column comment on Hive Metastore is 256 characters. Glue Data Catalog allows 255 characters for comment. We need to truncate the character as @mikklepp pointed out. Current behaviorCreate a table on Hive Metastore v3.1.3. Hive automatically truncates longer comments to 255 characters.
The migration job from the Hive Metastore to Glue Data Catalog failed because the comment is 256 characters.
Test for from-jdbc modeSteps
def transform_ms_columns(self, ms_columns):
def extract_row(row):
def truncate(x):
return x[:255] if hasattr(x,"__getitem__") else x
return (
row['COLUMN_NAME'],
row['TYPE_NAME'],
truncate(row['COMMENT'])
)
return self.transform_df_with_idx(
df=ms_columns,
id_col="CD_ID",
idx="INTEGER_IDX",
payloads_column_name="columns",
payload_type=StructType(
[
StructField(name="name", dataType=StringType()),
StructField(name="type", dataType=StringType()),
StructField(name="comment", dataType=StringType()),
]
),
payload_func=extract_row,
) Result
Test for from-metastore modeSteps
$ spark-submit \
--jars $MYSQL_JAR_PATH \
/home/hadoop/hive_metastore_migration.py \
--mode from-metastore \
--jdbc-url jdbc:mysql://**:3306 \
--jdbc-user hive \
--jdbc-password ** \
--output-path s3://path/ ResultThe output json file contains truncated comment.
NoteHive accepts 4,000 characters comment for partition key, and this implementation doesn't truncate comments on partition column. We need to add the same change in transform_ms_partition_keys() as well. |
No description provided.