Skip to content

Official implementation for the paper "Watermark Smoothing Attacks against Language Models" accepted at EMNLP 2025 Findings.

License

Notifications You must be signed in to change notification settings

changhongyan123/watermark_smoothing

Repository files navigation

Watermark Smoothing Attacks against Language Models

This repository contains the official implementation for the paper "Watermark Smoothing Attacks against Language Models" accepted at EMNLP 2025 Findings.

Installation

Install the required dependencies:

pip install -r requirements.txt

Note: Some watermarking methods (XSIR, UPV, SIR) require additional model downloads. Please refer to the MarkLLM framework documentation for detailed instructions.

Quick Start

Running Experiments

Generate results for OPT models:

bash run_opt.sh

Generate results for LLaMA models:

bash run_llama.sh

Repository Structure

├── attack_processor.py          # Core implementation of watermark smoothing attack
├── main.py                     # Main execution script
├── config/                     # Configuration files for different watermarking methods
│   ├── DIP.json
│   ├── EWD.json
│   ├── EXP.json
│   ├── KGW.json
│   ├── SIR.json
│   ├── SWEET.json
│   └── ...
├── watermark/                  # Watermarking implementations (based on MarkLLM)
│   ├── auto_watermark.py
│   ├── base.py
│   ├── dip/
│   ├── exp/
│   ├── kgw/
│   └── ...
├── evaluation/                 # Evaluation pipelines and tools
│   ├── pipelines/
│   ├── tools/
│   └── examples/
├── dataset/                    # Dataset handling
├── utils/                      # Utility functions
└── visualize/                  # Visualization tools

Acknowledgments

This codebase extends the MarkLLM framework (Apache License 2.0) with the following key modifications:

  • Added attack_processor parameter to watermarking methods
  • Integrated smoothing attack into the generation pipeline
  • Enhanced evaluation metrics for attack assessment

We thank the authors for their excellent foundation.

About

Official implementation for the paper "Watermark Smoothing Attacks against Language Models" accepted at EMNLP 2025 Findings.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published