Skip to content

vLLM has a Weakness in MultiModalHasher Image Hashing Implementation

Moderate severity GitHub Reviewed Published May 28, 2025 in vllm-project/vllm • Updated May 29, 2025

Package

pip vllm (pip)

Affected versions

>= 0.7.0, < 0.9.0

Patched versions

0.9.0

Description

Summary

In the file vllm/multimodal/hasher.py, the MultiModalHasher class has a security and data integrity issue in its image hashing method. Currently, it serializes PIL.Image.Image objects using only obj.tobytes(), which returns only the raw pixel data, without including metadata such as the image’s shape (width, height, mode). As a result, two images of different sizes (e.g., 30x100 and 100x30) with the same pixel byte sequence could generate the same hash value. This may lead to hash collisions, incorrect cache hits, and even data leakage or security risks.

Details

  • Affected file: vllm/multimodal/hasher.py
  • Affected method: MultiModalHasher.serialize_item
    https://github.com/vllm-project/vllm/blob/9420a1fc30af1a632bbc2c66eb8668f3af41f026/vllm/multimodal/hasher.py#L34-L35
  • Current behavior: For Image.Image instances, only obj.tobytes() is used for hashing.
  • Problem description: obj.tobytes() does not include the image’s width, height, or mode metadata.
  • Impact: Two images with the same pixel byte sequence but different sizes could be regarded as the same image by the cache and hashing system, which may result in:
    • Incorrect cache hits, leading to abnormal responses
    • Deliberate construction of images with different meanings but the same hash value

Recommendation

In the serialize_item method, serialization of Image.Image objects should include not only pixel data, but also all critical metadata—such as dimensions (size), color mode (mode), format, and especially the info dictionary. The info dictionary is particularly important in palette-based images (e.g., mode 'P'), where the palette itself is stored in info. Ignoring info can result in hash collisions between visually distinct images with the same pixel bytes but different palettes or metadata. This can lead to incorrect cache hits or even data leakage.

Summary:
Serializing only the raw pixel data is insecure. Always include all image metadata (size, mode, format, info) in the hash calculation to prevent collisions, especially in cases like palette-based images.

Impact for other modalities
For the influence of other modalities, since the video modality is transformed into a multi-dimensional array containing the length, width, time, etc. of the video, the same problem exists due to the incorrect sequence of numpy as well.

For audio, since the momo function is not enabled in librosa.load, the loaded audio is automatically encoded into single channels by librosa and returns a one-dimensional array of numpy, thus keeping the structure of numpy fixed and not affected by this issue.

Fixes

References

@russellb russellb published to vllm-project/vllm May 28, 2025
Published to the GitHub Advisory Database May 28, 2025
Reviewed May 28, 2025
Published by the National Vulnerability Database May 29, 2025
Last updated May 29, 2025

Severity

Moderate

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v3 base metrics

Attack vector
Network
Attack complexity
High
Privileges required
Low
User interaction
None
Scope
Unchanged
Confidentiality
Low
Integrity
None
Availability
Low

CVSS v3 base metrics

Attack vector: More severe the more the remote (logically and physically) an attacker can be in order to exploit the vulnerability.
Attack complexity: More severe for the least complex attacks.
Privileges required: More severe if no privileges are required.
User interaction: More severe when no user interaction is required.
Scope: More severe when a scope change occurs, e.g. one vulnerable component impacts resources in components beyond its security scope.
Confidentiality: More severe when loss of data confidentiality is highest, measuring the level of data access available to an unauthorized user.
Integrity: More severe when loss of data integrity is the highest, measuring the consequence of data modification possible by an unauthorized user.
Availability: More severe when the loss of impacted component availability is highest.
CVSS:3.1/AV:N/AC:H/PR:L/UI:N/S:U/C:L/I:N/A:L

EPSS score

Exploit Prediction Scoring System (EPSS)

This score estimates the probability of this vulnerability being exploited within the next 30 days. Data provided by FIRST.
(17th percentile)

CVE ID

CVE-2025-46722

GHSA ID

GHSA-c65p-x677-fgj6

Source code

Credits

Loading Checking history
See something to contribute? Suggest improvements for this vulnerability.