Skip to content

HIVE-29066: PARTITION_NAME_WHITELIST_PATTERN is not honouring session level configuration #5943

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Aggarwal-Raghav
Copy link
Contributor

What changes were proposed in this pull request?

Check HIVE-29066, Ensure that the partitionValidationPattern is picked from the metaconf instead of HMSHandler Object.

Why are the changes needed?

To honour, modified session level metaconf.

Does this PR introduce any user-facing change?

NO

How was this patch tested?

q file is attached.

@Aggarwal-Raghav
Copy link
Contributor Author

The test failure is flaky as reported in https://issues.apache.org/jira/browse/HIVE-29061, will re-trigger if there are any review commets.

@Aggarwal-Raghav
Copy link
Contributor Author

Can you please help with the review @deniskuzZ @okumin ?

@Aggarwal-Raghav
Copy link
Contributor Author

Also, in HS2 DynamicPartitionCtx#whiteListPattern field exist per object. I am seeing the issue in cluster as well.
Testing (when setting the conf in beeline session)
Screenshot 2025-07-12 at 8 26 10 PM

(when i set the config in hive-site.xml)
Screenshot 2025-07-12 at 8 27 52 PM

Unfortunately, its not happening in q file. Can you suggest should I make similar changes in DynamicPartitionCtx and org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:1129?

@Aggarwal-Raghav
Copy link
Contributor Author

The reason why session level hive.metastore.partition.name.whitelist.pattern is because of this line:

confVal = Hive.get().getMetaConf(ConfVars.PARTITION_NAME_WHITELIST_PATTERN.getHiveName());

If we are using hive.metastore.partition.name.whitelist.pattern then it should look in:

confVal = Hive.get().getConf().get(ConfVars.PARTITION_NAME_WHITELIST_PATTERN.getHiveName());

instead of getMetaConf

In q files, as HMS runs is embedded the conf object "might be" shared.

@Aggarwal-Raghav Aggarwal-Raghav changed the title HIVE-29066: PARTITION_NAME_WHITELIST_PATTERN is not honouring session level metaconf [WIP] HIVE-29066: PARTITION_NAME_WHITELIST_PATTERN is not honouring session level metaconf Jul 13, 2025
@Aggarwal-Raghav Aggarwal-Raghav changed the title [WIP] HIVE-29066: PARTITION_NAME_WHITELIST_PATTERN is not honouring session level metaconf HIVE-29066: PARTITION_NAME_WHITELIST_PATTERN is not honouring session level metaconf Jul 13, 2025
@Aggarwal-Raghav Aggarwal-Raghav changed the title HIVE-29066: PARTITION_NAME_WHITELIST_PATTERN is not honouring session level metaconf HIVE-29066: PARTITION_NAME_WHITELIST_PATTERN is not honouring session level configuration Jul 23, 2025
@@ -1241,6 +1227,8 @@ public void process(Object row, int tag) throws HiveException {
} catch (SerDeException e) {
closeWriters(true);
throw new HiveException(e);
} catch (MetaException ex) {
throw new RuntimeException(ex);
Copy link
Member

@deniskuzZ deniskuzZ Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we throw HiveException? maybe squash with above SerDeException

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} catch (SerDeException | MetaException e) {
      closeWriters(true);
      throw new HiveException(e);
    }

Doing so can result in error in q file in dynamic_partitions_with_whitelist.q

< Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(boolean, java.util.List)" because "this.fpaths" is null
---
> Caused by: java.lang.RuntimeException: MetaException(message:Partition value 'val_129' contains a character not matched by whitelist pattern '[^9]*'. Configure with hive.metastore.partition.name.whitelist.pattern for dynamic partitioning otherwise use metaconf:metastore.partition.name.whitelist.pattern)
37a38,39

Earlier it was HiveFatalException, let's contiue that what do you think?

 } catch (SerDeException e) {
      closeWriters(true);
      throw new HiveException(e);
    } catch (MetaException e) {
      throw new HiveFatalException(e);
    }

CC @deniskuzZ

import org.apache.hadoop.hive.ql.metadata.HiveFatalException;
import org.apache.hadoop.hive.ql.metadata.HiveStorageHandler;
import org.apache.hadoop.hive.ql.metadata.HiveUtils;
import org.apache.hadoop.hive.ql.metadata.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid wildcard imports

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, Editor's doing.

"' contains a character " + "not matched by whitelist pattern '" +
partitionValidationPattern.toString() + "'. " + "(configure with " +
MetastoreConf.ConfVars.PARTITION_NAME_WHITELIST_PATTERN.getVarname() + ")");
public static Pattern getPartitionValidationRegex(Configuration conf) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can it be private?

public static boolean partitionNameHasValidCharacters(List<String> partVals,
Pattern partitionValidationPattern) {
return getPartitionValWithInvalidCharacter(partVals, partitionValidationPattern) == null;
public static boolean partitionNameHasValidCharacters(List<String> partVals, Configuration conf) {
Copy link
Member

@deniskuzZ deniskuzZ Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, can it be private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is referenced in HMSHandler#partition_name_has_valid_characters() but we can make getPartitionValWithInvalidCharacter private.


return null;
public static String getPartitionValWithInvalidCharacter(
Copy link
Member

@deniskuzZ deniskuzZ Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should move this code to metastore-common module (MetaStoreUtils), otherwise, we are coupling hive-exec with metastore-server

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Let me check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deniskuzZ, I have a question, can you please help me clear a doubt for me?

  1. FileSinkOperator#process() calling MetaStoreServerUtils.validatePartitionNameCharacters (Line 1120) is also coupling right? Yes/No 😅
  2. I believe its Yes, then I'm inclined toward having duplicate methods for partition regex validation in HS2 as earlier it was.

Another question:
As FileSinkOperator runs inside TezTask for dynamic partitioning (insert overwrite query) is that why it is working? as necessary java classes are serialized via kryo to tez task because without thrift, the communication shouldn't be possible b/w ql module and metastore-server? Am I missing something?

@@ -91,7 +91,7 @@ public DynamicPartitionCtx(List<String> partColNames, String defaultPartName,
this.spPath = null;
String confVal;
try {
confVal = Hive.get().getMetaConf(ConfVars.PARTITION_NAME_WHITELIST_PATTERN.getHiveName());
confVal = Hive.get().getMetaConf(ConfVars.PARTITION_NAME_WHITELIST_PATTERN.getVarname());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is enough for fixing dynamic partition. Reasoning:

hivename: hive.metastore.partition.name.whitelist.pattern
varname: metastore.partition.name.whitelist.pattern

As users do set metaconf:metastore.partition.name.whitelist.pattern=[^9]*, then varname is used not the hivename. I believe it was never working earlier also for session level.

NOTE: This also means that in hive-site.xml as well this is now going to be picked instead of hivename.

  <property>
    <name>metastore.partition.name.whitelist.pattern</name>
    <value>[^9]*</value>
  </property>

This shouldn't be an issue because:

  1. in HiveConf this config is deprecated.
  2. If users still uses hive.metastore.partition.name.whitelist.pattern, then also MoveTask will fail (add_partitions_req) which is indireclty the expected behaviour
Error: Error while compiling statement: FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask. MetaException(message:Partition value '09' contains a character not matched by whitelist pattern '[^9]*'. Configure with metastore.partition.name.whitelist.pattern); Query ID: raghav_20250729215001_82d33e22-04db-4be3-8b01-189504089bf6 (state=08S01,code=40000)
  1. If users uses hive.metastore.partition.name.whitelist.pattern for HMS Side operation like alter then it will also work becuase of MetastoreConf#getVar() which check for hivename as failsafe. That's why q file are not required to change.

@@ -17,4 +16,5 @@ load data local inpath '../../data/files/bmj/000000_0' INTO TABLE source_table p
-- If the directory is not empty the hook will throw an error, instead the error should come from the metastore
-- This shows that no dynamic partitions were created and left behind or had directories created

SET metaconf:metastore.partition.name.whitelist.pattern=[^9]*;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for moving this line below is, the metaconf will be resetted after any meta-operation i.e create etc. My assumption is, metconf should always be a line before the config is consumed in the query. the question can be tracked here.

Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants