Skip to content

Potential Timing Side-Channel Vulnerability in vLLM’s Chunk-Based Prefix Caching

Low
russellb published GHSA-4qjh-9fv9-r85r May 28, 2025

Package

pip vllm (pip)

Affected versions

<0.9.0

Patched versions

0.9.0

Description

Dear vLLM Team,

We are a security research group, and while we are impressed with vLLM’s innovative design, particularly the efficiency of its PageAttention mechanism and chunk-based token management, we have identified a potential security vulnerability. This issue arises from the prefix caching mechanism, which may expose the system to a timing side-channel attack.

Description

When a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). Our tests revealed that the timing differences caused by matching chunks are significant enough to be recognized and exploited.

For instance, if the victim has submitted a sensitive prompt or if a valuable system prompt has been cached, an attacker sharing the same backend could attempt to guess the victim's input. By measuring the TTFT based on prefix matches, the attacker could verify if their guess is correct, leading to potential leakage of private information.

Unlike token-by-token sharing mechanisms, vLLM’s chunk-based approach (PageAttention) processes tokens in larger units (chunks). In our tests, with chunk_size=2, the timing differences became noticeable enough to allow attackers to infer whether portions of their input match the victim's prompt at the chunk level.

Environment

  • GPU: NVIDIA A100 (40G)
  • CUDA: 11.8
  • PyTorch: 2.3.1
  • OS: Ubuntu 18.04
  • vLLM: v0.5.1
    Configuration: We launched vLLM using the default settings and adjusted chunk_size=2 to evaluate the TTFT.

Leakage

We conducted our tests using LLaMA2-70B-GPTQ on a single device. We analyzed the timing differences when prompts shared prefixes of 2 chunks, and plotted the corresponding ROC curves. Our results suggest that timing differences can be reliably used to distinguish prefix matches, demonstrating a potential side-channel vulnerability.
roc_curves_combined_block_2

Results

In our experiment, we analyzed the response time differences between cache hits and misses in vLLM's PageAttention mechanism. Using ROC curve analysis to assess the distinguishability of these timing differences, we observed the following results:

  • With a 1-token prefix, the ROC curve yielded an AUC value of 0.571, indicating that even with a short prefix, an attacker can reasonably distinguish between cache hits and misses based on response times.
  • When the prefix length increases to 8 tokens, the AUC value rises significantly to 0.99, showing that the attacker can almost perfectly identify cache hits with a longer prefix.

Fixes

Severity

Low

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v3 base metrics

Attack vector
Network
Attack complexity
High
Privileges required
Low
User interaction
Required
Scope
Unchanged
Confidentiality
Low
Integrity
None
Availability
None

CVSS v3 base metrics

Attack vector: More severe the more the remote (logically and physically) an attacker can be in order to exploit the vulnerability.
Attack complexity: More severe for the least complex attacks.
Privileges required: More severe if no privileges are required.
User interaction: More severe when no user interaction is required.
Scope: More severe when a scope change occurs, e.g. one vulnerable component impacts resources in components beyond its security scope.
Confidentiality: More severe when loss of data confidentiality is highest, measuring the level of data access available to an unauthorized user.
Integrity: More severe when loss of data integrity is the highest, measuring the consequence of data modification possible by an unauthorized user.
Availability: More severe when the loss of impacted component availability is highest.
CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:L/I:N/A:N

CVE ID

CVE-2025-46570

Weaknesses

No CWEs

Credits