This solution illustrates how to implement privacy requirements for medical records with a near real-time identification and redaction of PII and PHI in data stream using natural language processing (NLP) techniques.
Data generated by monitoring devices and human annotations is processed in transit for privacy-sensitive content using Amazon Comprehend Medical. If detected, PII and PHI data are redacted and the data is delivered in a Data Lake for further processing.
- Patients enrolled in the study carry continuous monitors for glucose and insulin concentration in the blood stream, location, and body temperature
- General physicians routinely evaluate patients conditions. Annotations are recorded on patients’ files
- Data is ingested, PII and PHI are identified and redacted
- Data scientists access the anonymised records for further investigations
The solution is designed to be distributed as a mono-repository containing all the components (both software and infrastructure).
- S3 bucket to store the terraform state file
- DynamoDB table to lock the terraform state file
https://developer.hashicorp.com/terraform/language/settings/backends/s3
Before executing the commands below, change the references in the file
src/terraform/_variables.tf
and
src/terraform/_backend.tf
to match your account configuration.
tfenv use 1.2.8
cd src/terraform
terrafom init
terraform apply
All the data included in this repository has been synthetically generated.