-
Notifications
You must be signed in to change notification settings - Fork 828
Add an option to provide parameters by JSON file for hive metastore migration utility #175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
* Resolved the warning message with yaml.load() : https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation
…on.py can be loaded from Glue 5.0 job, which doesn't have PyYAML by default
- Set `--direction` to `to_metastore`. | ||
- Set `--mode` to `to_metastore`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, there was not the argument --direction
and we are going to update README to match the implementation, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, --direction
option has not existed in the code and it corrected README.
options = get_options(parser, args) | ||
|
||
if options["mode"] == FROM_METASTORE: | ||
if options.get("config_file") is not None: | ||
# parse json config file if provided | ||
config_file_path = options["config_file"] | ||
logger.info(f"config_file provided. Parsing arguments from {config_file_path}") | ||
with open(config_file_path, 'r') as json_file_stream: | ||
config_options = json.load(json_file_stream) | ||
|
||
# merge options. command line options are prioritized. | ||
for key in config_options: | ||
if not options.get(key): | ||
options[key] = config_options[key] | ||
elif options[key] is None: | ||
options[key] = config_options[key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do like this, for example, if both config_file
and mode
are provided, the mode
is replaced by the mode specified in the config file. Is my understanding correct? Is this intended behavior? To me, if we explicitly set arguments, it's natural to prioritize the explicit argument over the config file.
In any of those decisions, we will need to explain how it is prioritized in README.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, if a same argument is provided from both on command line and via config_file
, the value in the config file is ignored. options
variable is created at line 1600, and command line arguments are read at the time. Arguments in the config files are added into options
variable only when corresponding arguments are not provided from command line. This is a test case 3 scenario.
I will update README to explain this behavior.
…in the config file and command line
This is a PR to merge #13
Changes made in #13
from_metastore
andto_metastore
options. This prevents JDBC password exposure on spark-submit command.Changes added in this PR
Tests
Testing with a config file
Case 1: run with expected config file
Result: Succeeded and exported metastore to S3.
Case 2: run with a config file and lack mandatory parameter
jdbc_password
)Result: failed the command with the following message as expected.
Case 3: run with a config file and command line parameter
--database-prefix
Result: successfully metastore was exported, and --database-prefix was overriden by command line config.
Testing without a config file (regression)
case 4: run with command line parameters
Result: the job succeeded.
case 5: run with from-s3 mode
Result: Succeeded (importing updated hive_metastore_migration.py didn't affect the behavior)
case 6: run with to-s3 mode
Result: Succeeded (importing updated hive_metastore_migration.py didn't affect the behavior)