Skip to content

evidlabel/did

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DID (De-ID) Pseudonymizer

A CLI tool to anonymize Markdown, plain text, TeX, and BibTeX files with spaCy-based entity detection and automatic YAML configuration.

Features

  • Detects names, emails, addresses, phone numbers, and CPR numbers using Presidio with spaCy
  • Groups name and number variants using rapidfuzz
  • Extracts entities to generate a YAML config (did ex)
  • Anonymizes text using YAML config (did an), preserving file formats
  • Supports English (en) and Danish (da)

Installation

uv sync

Quick Usage

Extract entities:

uv run did ex -f input.txt -c config.yaml

Anonymize:

uv run did an -f input.txt -c config.yaml -o output.txt

For details, see the documentation.

About

CLI pseudonymization tool

Resources

License

Stars

Watchers

Forks

Languages