This project routinely and automatically cross-pollinates data between Wikidata and Civic Tech Field Guide.
Primarily, it should be seen as a two-way ELT or sync job, but it can also (gradually) expand its own scope like a web crawler.
Not only does it put the two databases in conversation with each other, it also puts them in conversation with humans. Specifically, it will sometimes submit suggested changes for human review. Once humans confirm or refine those suggestions (for example confirming entity resolution), it will likely lead to new points of integration (e.g. syncing attributes of the entity between the two databases).
- Extract: Pull a sample of CTFG records.
- Extract: Search Wikidata for IDs to match any Listings that don't have one yet.
- Load: Update CTFG with (possibly multiple) matching Wikidata IDs.
- Load: For well-matched Wikidata IDs, pull entire record into special field of CTFG.
- Transform: Present suggested updates in new field(s) of CTFG DB.
- Transform: Make confident updates to Wikibase.
- Transform (Bonus): List lower confidence Wikidata edits in special field in CTFG.
- Don't rush (let data converge slowly over time).
- Keep business logic out of this low-level integration (e.g. let CTFG manually or automatically accept suggestions within their own DB).
- Keep incoming and outgoing fields separate from each other and the fields of record (supporting the above).
CTFG DB Engine: Airtable
questions:
- make sure Wikidata IDs don't duplicate in airtable?
The regular run might deploy some things (like basic fields), but consult ./manual_deployments.md to see what needs to be deployed manually.