Skip to content

Reduce the number of tables needed, #2392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

mgazza
Copy link
Collaborator

@mgazza mgazza commented May 25, 2025

Introduce migrations and reduce the number of tables needed.

@springfall2008
Copy link
Owner

Not sure I understand this to be honest, e.g. how the query only selects the latest state value quickly or how the cleanup works.

@springfall2008
Copy link
Owner

But I probably wouldn't include migrations, I doubt anyone is using the DB apart from us right now, just start a new DB. If it was included the files would have to be added to the download list etc.

INSERT INTO state (entity_id, state, attributes, updated_at)
VALUES (?, ?, ?, ?, ?)
ON CONFLICT(entity_id) DO UPDATE SET
state=excluded.state,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reference to excluded

Copy link
Collaborator Author

@mgazza mgazza May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excluded is a special table alias in SQLite that refers to the row that was proposed for insertion but caused a conflict. So in:

ON CONFLICT(entity_id) DO UPDATE SET
    state = excluded.state

…it means we update the existing row's state to the state value from the conflicting row. You can see more in the SQLite docs under UPSERT — the excluded. keyword is described in the "ON CONFLICT DO UPDATE" section.

@mgazza
Copy link
Collaborator Author

mgazza commented May 26, 2025

Not sure I understand this to be honest, e.g. how the query only selects the latest state value quickly or how the cleanup works.

The query for latest state uses ORDER BY timestamp DESC LIMIT 1, so we always get the most recent entry per key. Cleanup is handled by a background thread that periodically purges old rows beyond the retention window (db_days) using a DELETE where clause based on timestamp. I can add inline comments to clarify both of these in the code if that would help.

@mgazza
Copy link
Collaborator Author

mgazza commented May 26, 2025

But I probably wouldn't include migrations, I doubt anyone is using the DB apart from us right now, just start a new DB. If it was included the files would have to be added to the download list etc.

happy to drop the migration logic for now and just require a clean DB. I’ll remove the migration scripts and simplify accordingly unless we want to support older data later on.

@springfall2008
Copy link
Owner

I remember now why you want a states table also, as the larger data items you need to keep in the latest state but you don't want to keep the history for them. How do you handle this with one table?

@mgazza
Copy link
Collaborator Author

mgazza commented Jun 2, 2025

I remember now why you want a states table also, as the larger data items you need to keep in the latest state but you don't want to keep the history for them. How do you handle this with one table?

Thats a good point, having the state history and the entity(with the latest data) is fine.

mgazza added 2 commits June 2, 2025 17:43
unifies entity identification across instances by replacing the
autoincrement-based 'entities' lookup table with a consistent 64-bit
SHA256-derived hash of the entity_id. Simplifies schema design by
removing the entities and states tables, using 'latest' and 'history'
tables indexed by entity_hash instead. Improves lookup performance,
avoids join logic, and ensures cross-system consistency. Also adds
indexes on entity_hash and datetime to optimize query performance.
@mgazza
Copy link
Collaborator Author

mgazza commented Jun 2, 2025

re-worked to address your comments. Added a test that passed before and after my change. I can PR the test on its own if you like (I probably should)

@mgazza
Copy link
Collaborator Author

mgazza commented Jun 2, 2025

Question: for history, do we need to keep history for columns other than value?

@springfall2008
Copy link
Owner

Question: for history, do we need to keep history for columns other than value?

Home assistant keeps the history of all attributes, this mirrors the same thing but it removes 'big' attributes first

@mgazza mgazza closed this Aug 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants