Skip to content

Conversation

intelgaoxiong
Copy link
Contributor

@intelgaoxiong intelgaoxiong commented Aug 11, 2025

Details:

This PR implements prefix caching in NPUW.
In existing inference engines, the KV cache of a request is discarded after processing is completed, preventing the KV cache from being reused across multiple calls and significantly slowing down the execution.
https://arxiv.org/pdf/ 2312.07104 proposed a new technique to reuse KV cache automatically across multiple generation calls.

Tickets:

@github-actions github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Aug 11, 2025
@intelgaoxiong intelgaoxiong force-pushed the xiong/prefix_caching branch 4 times, most recently from a7a59db to 48f060e Compare August 13, 2025 07:03
@intelgaoxiong
Copy link
Contributor Author

build_jenkins

@intelgaoxiong intelgaoxiong marked this pull request as ready for review August 13, 2025 09:11
@intelgaoxiong intelgaoxiong requested review from a team as code owners August 13, 2025 09:11
@intelgaoxiong intelgaoxiong requested a review from dmatveev August 13, 2025 09:12
@intelgaoxiong intelgaoxiong force-pushed the xiong/prefix_caching branch 5 times, most recently from 719eac4 to c9ee5f6 Compare August 22, 2025 02:08
Signed-off-by: intelgaoxiong <[email protected]>
Signed-off-by: intelgaoxiong <[email protected]>
Signed-off-by: intelgaoxiong <[email protected]>
Signed-off-by: intelgaoxiong <[email protected]>
Signed-off-by: intelgaoxiong <[email protected]>
Signed-off-by: intelgaoxiong <[email protected]>
Signed-off-by: intelgaoxiong <[email protected]>
Signed-off-by: intelgaoxiong <[email protected]>
Signed-off-by: intelgaoxiong <[email protected]>
Signed-off-by: intelgaoxiong <[email protected]>
Signed-off-by: intelgaoxiong <[email protected]>
Signed-off-by: intelgaoxiong <[email protected]>
Signed-off-by: intelgaoxiong <[email protected]>
Signed-off-by: intelgaoxiong <[email protected]>
Signed-off-by: intelgaoxiong <[email protected]>
Signed-off-by: intelgaoxiong <[email protected]>
Signed-off-by: intelgaoxiong <[email protected]>
Signed-off-by: intelgaoxiong <[email protected]>
Signed-off-by: intelgaoxiong <[email protected]>
Signed-off-by: intelgaoxiong <[email protected]>
Signed-off-by: intelgaoxiong <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: build OpenVINO cmake script / infra category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants