Skip to content

A semantic brand protection agent powered by Google's Gemini AI. This tool helps detect fraudulent, malicious, or brand-abusing domains across the internet using advanced LLM-based semantic analysis and customizable analyst personas.

License

PAST2212/brand-protection-analyst-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Brand Protection Analyst

AI Brand Protection Analyst Agent

A semantic brand protection agent powered by Google's Open Source Gemini 2.5 Pro AI Model. This tool helps detect fraudulent, malicious, or brand-abusing domains across the internet using advanced LLM-based semantic analysis and customizable analyst personas.


Motivation

After years working with threat intelligence and brand protection products, I've reverse-engineered some truly "creative" evaluation techniques.
One of the most memorable: a major platform that hard-cuts domain feeds, discarding any domain that doesn't start or end with the brand name.

This shortcut is often necessary to reduce the flood of irrelevant fullword matches. For short brand names like "tui" or "otto", it comes at a serious cost: Thousands of potentially threatening domains are silently lost every week.

Domains like secure-tui-login[.]com, my-tui-booking[.]net or nl-ottoshop[.]nl never reach analyst desks because they fail a basic syntax check.

This project was born from the need to:

  • think semantically, not just syntactically,
  • detect threats beyond keyword / dictionary matches,
  • empower analysts with LLM-based reasoning,
  • recognize internationalized and multilingual domain patterns — domain registrations in localized variations are often missed by static dictionary-based systems.

Features

  • Semantic Threat Detection — Detects impersonation, phishing, and domain abuse based on brand context on a large scale.
  • Analyst Modes — Choose between junior, senior, and expert AI analyst personas.
  • Batch Processing — Efficiently handles large domain datasets via Gemini API.
  • Structured Output — Saves results with confidence scores, risk levels, and explanations.
  • Flexible API Key Handling — Use command-line, environment variables, .env, or secure prompt.
  • Example Analysis Results:

Analyst Results


Installation

  1. Clone the repo:
git clone https://github.com/PAST2212/brand-protection-analyst-agent.git
cd brand-protection-analyst-agent
  1. Install dependencies:
pip install -r requirements.txt
  1. Generate an (free of charge) API Key for (Open Source) Gemini 2.5 Pro Model from here: https://aistudio.google.com/apikey

  2. Add your Gemini 2.5 Pro API key:

  • Option 1: via .env file
echo "GEMINI_API_KEY=your_actual_api_key_here" > .env
  • Option 2: via environment variable
export GEMINI_API_KEY=your_actual_api_key_here
  • Option 3: pass via CLI (--api-key)

Updating

cd brand-protection-analyst-agent
git pull

If you encounter a merge error:

git reset --hard
git pull

Usage

Overview of different available commands

python main.py --help

Examples

# Basic analysis
python main.py --domains tui.txt --brand-name "tui"

# With custom output
python main.py --domains tui.txt --brand-name "tui" --output tui_analysis.csv

# Advanced analysis
python main.py --domains tui.txt --brand-name "tui"   --company-name "TUI AG"   --industry "Travel & Tourism"   --description "TUI AG (trading as TUI Group) is a German multinational leisure, travel and tourism company; it is the largest such company in the world. It fully or partially owns several travel agencies, hotel chains, cruise lines and retail shops as well as five European airlines. TUI is an acronym for Touristik Union International (Tourism Union International). It is headquartered in Hanover, Germany"   --batch-size 500   --analyst junior   --output tui_results.csv

Default Values

Argument Default
--batch-size 200
--analyst senior

Analyst Modes

Mode Description
junior Entry-level analyst. Consistent, rule-based, deterministic pattern detection.
senior Balanced reasoning. Slightly creative with nuanced evaluation. (default)
expert Advanced threat detection. High pattern recognition and semantic flexibility.

Domain File and Output Files

Input Domain Files

  1. Place domain files in data/ folder
  2. Use .txt format with one domain per line
  3. Example:
    data/
    ├── tui.txt
    ├── otto.txt
    └── gea.txt
    

Output Files

All output files are stored in the data/ folder and include:

  • *_threats.csv — Identified threat domains
  • *_filtered.csv — Domains considered safe
  • *_complete.json — Full analysis report

Each .csv contains:

  • Domain
  • Confidence score
  • Relevance
  • Risk level
  • Explanation

API Rate Limits

Gemini API rate limits: Gemini Rate Limits


Notes

  • Google Gemini 2.5 Pro
  • Python 3.10+
  • Input and Output files are handled in the data/ folder
  • Only fulltext domain matches are considered
  • Use your current domain monitoring provider for domain data source input or others like my other project: domainthreat
  • Some example results are saved in data/tui_results_threats.csv

TODO

  • IDN support
  • Multimodal processing
  • Additional evaluation features

Author

Patrick Steinhoff
LinkedIn


Disclaimer

This tool is intended for research, legal and security analysis purposes only.

About

A semantic brand protection agent powered by Google's Gemini AI. This tool helps detect fraudulent, malicious, or brand-abusing domains across the internet using advanced LLM-based semantic analysis and customizable analyst personas.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages