-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-29066: PARTITION_NAME_WHITELIST_PATTERN is not honouring session level configuration #5943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
bbf81bd
to
307e7df
Compare
The test failure is flaky as reported in https://issues.apache.org/jira/browse/HIVE-29061, will re-trigger if there are any review commets. |
Can you please help with the review @deniskuzZ @okumin ? |
...ne-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
Outdated
Show resolved
Hide resolved
Also, in HS2 DynamicPartitionCtx#whiteListPattern field exist per object. I am seeing the issue in cluster as well. (when i set the config in hive-site.xml) Unfortunately, its not happening in q file. Can you suggest should I make similar changes in DynamicPartitionCtx and org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:1129? |
1411a40
to
8e20862
Compare
The reason why session level
If we are using
instead of getMetaConf In q files, as HMS runs is embedded the conf object "might be" shared. |
8e20862
to
3afea9d
Compare
@@ -1241,6 +1227,8 @@ public void process(Object row, int tag) throws HiveException { | |||
} catch (SerDeException e) { | |||
closeWriters(true); | |||
throw new HiveException(e); | |||
} catch (MetaException ex) { | |||
throw new RuntimeException(ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we throw HiveException? maybe squash with above SerDeException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} catch (SerDeException | MetaException e) {
closeWriters(true);
throw new HiveException(e);
}
Doing so can result in error in q file in dynamic_partitions_with_whitelist.q
< Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(boolean, java.util.List)" because "this.fpaths" is null
---
> Caused by: java.lang.RuntimeException: MetaException(message:Partition value 'val_129' contains a character not matched by whitelist pattern '[^9]*'. Configure with hive.metastore.partition.name.whitelist.pattern for dynamic partitioning otherwise use metaconf:metastore.partition.name.whitelist.pattern)
37a38,39
Earlier it was HiveFatalException
, let's contiue that what do you think?
} catch (SerDeException e) {
closeWriters(true);
throw new HiveException(e);
} catch (MetaException e) {
throw new HiveFatalException(e);
}
CC @deniskuzZ
import org.apache.hadoop.hive.ql.metadata.HiveFatalException; | ||
import org.apache.hadoop.hive.ql.metadata.HiveStorageHandler; | ||
import org.apache.hadoop.hive.ql.metadata.HiveUtils; | ||
import org.apache.hadoop.hive.ql.metadata.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please avoid wildcard imports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, Editor's doing.
"' contains a character " + "not matched by whitelist pattern '" + | ||
partitionValidationPattern.toString() + "'. " + "(configure with " + | ||
MetastoreConf.ConfVars.PARTITION_NAME_WHITELIST_PATTERN.getVarname() + ")"); | ||
public static Pattern getPartitionValidationRegex(Configuration conf) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can it be private?
public static boolean partitionNameHasValidCharacters(List<String> partVals, | ||
Pattern partitionValidationPattern) { | ||
return getPartitionValWithInvalidCharacter(partVals, partitionValidationPattern) == null; | ||
public static boolean partitionNameHasValidCharacters(List<String> partVals, Configuration conf) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, can it be private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is referenced in HMSHandler#partition_name_has_valid_characters()
but we can make getPartitionValWithInvalidCharacter
private.
|
||
return null; | ||
public static String getPartitionValWithInvalidCharacter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should move this code to metastore-common
module (MetaStoreUtils), otherwise, we are coupling hive-exec with metastore-server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. Let me check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deniskuzZ, I have a question, can you please help me clear a doubt for me?
- FileSinkOperator#process() calling MetaStoreServerUtils.validatePartitionNameCharacters (Line 1120) is also coupling right? Yes/No 😅
- I believe its Yes, then I'm inclined toward having duplicate methods for partition regex validation in HS2 as earlier it was.
Another question:
As FileSinkOperator runs inside TezTask for dynamic partitioning (insert overwrite query) is that why it is working? as necessary java classes are serialized via kryo to tez task because without thrift, the communication shouldn't be possible b/w ql
module and metastore-server
? Am I missing something?
3afea9d
to
3115932
Compare
@@ -91,7 +91,7 @@ public DynamicPartitionCtx(List<String> partColNames, String defaultPartName, | |||
this.spPath = null; | |||
String confVal; | |||
try { | |||
confVal = Hive.get().getMetaConf(ConfVars.PARTITION_NAME_WHITELIST_PATTERN.getHiveName()); | |||
confVal = Hive.get().getMetaConf(ConfVars.PARTITION_NAME_WHITELIST_PATTERN.getVarname()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is enough for fixing dynamic partition. Reasoning:
hivename: hive.metastore.partition.name.whitelist.pattern
varname: metastore.partition.name.whitelist.pattern
As users do set metaconf:metastore.partition.name.whitelist.pattern=[^9]*
, then varname is used not the hivename. I believe it was never working earlier also for session level.
NOTE: This also means that in hive-site.xml as well this is now going to be picked instead of hivename.
<property>
<name>metastore.partition.name.whitelist.pattern</name>
<value>[^9]*</value>
</property>
This shouldn't be an issue because:
- in HiveConf this config is deprecated.
- If users still uses hive.metastore.partition.name.whitelist.pattern, then also MoveTask will fail (add_partitions_req) which is indireclty the expected behaviour
Error: Error while compiling statement: FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask. MetaException(message:Partition value '09' contains a character not matched by whitelist pattern '[^9]*'. Configure with metastore.partition.name.whitelist.pattern); Query ID: raghav_20250729215001_82d33e22-04db-4be3-8b01-189504089bf6 (state=08S01,code=40000)
- If users uses hive.metastore.partition.name.whitelist.pattern for HMS Side operation like alter then it will also work becuase of
MetastoreConf#getVar()
which check for hivename as failsafe. That's why q file are not required to change.
@@ -17,4 +16,5 @@ load data local inpath '../../data/files/bmj/000000_0' INTO TABLE source_table p | |||
-- If the directory is not empty the hook will throw an error, instead the error should come from the metastore | |||
-- This shows that no dynamic partitions were created and left behind or had directories created | |||
|
|||
SET metaconf:metastore.partition.name.whitelist.pattern=[^9]*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for moving this line below is, the metaconf will be resetted after any meta-operation i.e create etc. My assumption is, metconf should always be a line before the config is consumed in the query. the question can be tracked here.
… level configuration
3115932
to
9194b78
Compare
|
What changes were proposed in this pull request?
Check HIVE-29066, Ensure that the partitionValidationPattern is picked from the metaconf instead of HMSHandler Object.
Why are the changes needed?
To honour, modified session level metaconf.
Does this PR introduce any user-facing change?
NO
How was this patch tested?
q file is attached.