Skip to content

Conversation

kelle
Copy link
Contributor

@kelle kelle commented Jul 16, 2025

Improve the load_astrodb function to look for reference_tables and felis_schema where we expect them to be. This should reduce the complexity of ingest scripts and takes advantage of the standarization provided by the template.

  • make more defaults and fewer options
  • improve error handling

Inspired by SIMPLE-AstroDB/SIMPLE-db#627

@kelle kelle marked this pull request as draft July 16, 2025 02:24
@kelle
Copy link
Contributor Author

kelle commented Jul 16, 2025

I think this is in good shape but would like to write some tests.

@kelle
Copy link
Contributor Author

kelle commented Jul 16, 2025

I think we should move the reference_tables out of __init.py__ and into lookup_tables.py module which contains a LOOKUP_TABLES variable in order to avoid confusion with the references listed in the Publications table.

@kelle
Copy link
Contributor Author

kelle commented Jul 22, 2025

@dr-rodriguez agrees we should rename it to LOOKUP_TABLES. Which can be done this PR for the load_astrodb function.

@kelle
Copy link
Contributor Author

kelle commented Jul 22, 2025

make more sub functions, eg., for loading the LOOKUP_TABLES variable.

spec = importlib.util.spec_from_file_location(db_name, init_path)
db_module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(db_module)
REFERENCE_TABLES = db_module.REFERENCE_TABLES
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add some message (logger.info or even warning) that alerts the user this was one on their behalf.

return db


def _rebuild_db(db_path, db_file, data_path, reference_tables, felis_schema):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want to set felis_schema = None here to avoid issues with line 122 below (which checks whether it's None or not)

@kelle
Copy link
Contributor Author

kelle commented Sep 2, 2025

Would be nice if the load_astrodb function name made it more obvious that this function 1) reads from JSON files if recreatedb = True and 2) creates a sqlite file. Typing this made me realize that we need two different functions.

LOAD_JSON= function to generate a SQLlite file from JSON files. equiliavent of recreatedb. = True.
READ_astrodb = reads from existing sqlite file. equilivant of recreatedb = False.

@kelle
Copy link
Contributor Author

kelle commented Sep 9, 2025

while chatting with David:
db = build_db_from_json(db_file)
db = read_db_from_file(db_file)

json_path, felis schema, reference table paths should be optional parameters that have defaults that match how the template repo is organized.

  • make new functions

  • call these functions from load_astrodb but gives a logger statement saying that load_astrodb is deprecated.

  • move LOOKUP_TABLES variable into schema/lookup_tables.yaml

@dr-rodriguez
Copy link
Collaborator

We could consider using configparser for how to build up a config file: https://docs.python.org/3/library/configparser.html

@kelle
Copy link
Contributor Author

kelle commented Sep 9, 2025

database.toml:

[database settings]
lookup_tables = ["Publications, ..."]
db_data_path = "data"
felis_path = "schema/felis.yaml"
db_name = "name.sqlite"

load in with configparser or tomllib. https://docs.python.org/3/library/tomllib.html#module-tomllib

@kelle kelle changed the base branch from main to improve_load_astrodb September 9, 2025 20:36
@kelle kelle marked this pull request as ready for review September 9, 2025 20:39
@kelle kelle merged commit 833e5a9 into astrodbtoolkit:improve_load_astrodb Sep 9, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants