Skip to content

[OpenVINO backend] Adding support for OpenVINO backend & support inference for Mistral & Gemma & GPT2 #2350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

Mohamed-Ashraf273
Copy link

@Mohamed-Ashraf273 Mohamed-Ashraf273 commented Aug 2, 2025

Description of the change

As a part of my GSoC25 project to support inference with the openvino backend for Gemma , Mistral and GPT-2,
This is my PR for supporting Gemma , Mistral and GPT-2 pipelines.

import os
os.environ["KERAS_BACKEND"] = "openvino"
import keras_hub

model = keras_hub.models.GPT2CausalLM.from_preset(
    "gpt2_large_en", dtype="float32"
)
model.summary()
output = model.generate("Keras is ", max_length=20)
print("Generated text:", output)

Reference

https://docs.openvino.ai/2025/index.html
https://keras.io/api/
https://keras.io/keras_hub/

Colab Notebook

Checklist

  • I have added all the necessary unit tests for my change.
  • I have verified that my change does not break existing code and works with all backends (TensorFlow, JAX, and PyTorch).
  • My PR is based on the latest changes of the main branch (if unsure, rebase the code).
  • I have followed the Keras Hub Model contribution guidelines in making these changes.
  • I have followed the Keras Hub API design guidelines in making these changes.
  • I have signed the Contributor License Agreement.

@github-actions github-actions bot added the Gemma Gemma model specific issues label Aug 2, 2025
@Mohamed-Ashraf273
Copy link
Author

@fchollet
@mattdangerw
@divyashreepathihalli
I'd appreciate any help reviewing this PR so we can get it merged. Thanks in advance!

@Mohamed-Ashraf273 Mohamed-Ashraf273 changed the title [OpenVINO backend] support inference for Mistral & Gemma & GPT2 using… [OpenVINO backend] support inference for Mistral & Gemma & GPT2 using OpenVINO backend Aug 2, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @Mohamed-Ashraf273, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've implemented support for running inference with Gemma, Mistral, and GPT-2 models on the OpenVINO backend. This work is part of my GSoC25 project and aims to broaden the compatibility of Keras Hub models with various hardware platforms. The changes involve integrating OpenVINO's inference capabilities directly into the CausalLM models, introducing new utility functions for OpenVINO-specific operations, and updating the testing infrastructure to properly handle the OpenVINO backend's characteristics, including skipping tests that involve training or have known incompatibilities.

Highlights

  • OpenVINO Backend Integration: I've added comprehensive support for running inference with Gemma, Mistral, and GPT-2 models using the OpenVINO backend. This significantly expands the deployment options for these models within Keras Hub.
  • New OpenVINO Utility Module: To facilitate OpenVINO integration, I've introduced a new utility module, openvino_utils.py, which handles critical aspects like input parameterization, structured output processing, and intelligent model reusability checks for optimized performance.
  • CausalLM Model Adaptations for OpenVINO: I've updated the core CausalLM model's make_generate_function to dynamically switch to OpenVINO-optimized inference when the OpenVINO backend is active. This includes handling specific ops.slice and ops.reshape behaviors required by OpenVINO for certain models like Gemma and GPT-2.
  • Enhanced Test Skipping and Compatibility for OpenVINO: To ensure test suite compatibility and efficiency with the OpenVINO backend, I've implemented a new requires_trainable_backend pytest marker. This allows tests that rely on training functionalities (not yet supported by OpenVINO) to be skipped appropriately. Additionally, I've introduced exclusion lists (openvino_excluded_concrete_tests.txt and openvino_excluded_tests.txt) to manage specific test skips for OpenVINO.
  • Sampler Logic Adjustments for OpenVINO: The sampler.py module has been modified to ensure its ops.while_loop constructs are compatible with OpenVINO's requirements, specifically by explicitly passing the mask parameter in the loop's condition and body functions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for OpenVINO backend inference for several models, which is a significant enhancement. The changes include modifications to the testing infrastructure to handle backend-specific tests, updates to the core model generation logic, and the addition of new OpenVINO utility functions and tests. Overall, the implementation is solid, but I've identified a critical bug in the CausalLM class that could lead to an AttributeError, along with several opportunities for code simplification, efficiency improvements, and increased robustness. Please see my detailed comments below.

@Mohamed-Ashraf273 Mohamed-Ashraf273 force-pushed the supporting_gemma_inference_with_ov_backend branch from 458c345 to 8eda2ab Compare August 2, 2025 19:38
@Mohamed-Ashraf273
Copy link
Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the OpenVINO backend for Gemma, Mistral, and GPT-2 models. The changes are extensive and well-structured, including:

  • Adding OpenVINO-specific logic in CausalLM for model compilation and inference.
  • Introducing a new openvino_utils.py module with helper functions and corresponding tests.
  • Updating conftest.py to selectively skip tests that are not compatible with the OpenVINO backend, using a new marker and an exclusion list.
  • Adding openvino to the project dependencies.

The implementation is solid, but I have a few suggestions to improve robustness and maintainability, such as using more specific exception handling, avoiding hardcoded device names for OpenVINO, and cleaning up test exclusion files. Overall, this is a great contribution to extend backend support.

@Mohamed-Ashraf273 Mohamed-Ashraf273 force-pushed the supporting_gemma_inference_with_ov_backend branch from 7875897 to ab31c72 Compare August 2, 2025 19:57
@Mohamed-Ashraf273 Mohamed-Ashraf273 force-pushed the supporting_gemma_inference_with_ov_backend branch from 9657b98 to 91b478f Compare August 3, 2025 11:55
@Mohamed-Ashraf273
Copy link
Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the OpenVINO backend for Gemma, Mistral, and GPT-2 models. The changes are comprehensive, touching the test configuration to skip non-compatible tests, adding core OpenVINO inference logic to CausalLM, and including necessary workarounds in model implementations. My review focuses on improving the correctness and maintainability of the new OpenVINO integration. I've identified a potential issue with input handling for the OpenVINO model, a recommendation to improve exception handling, and pointed out some opportunities for code cleanup.

@Mohamed-Ashraf273 Mohamed-Ashraf273 changed the title [OpenVINO backend] support inference for Mistral & Gemma & GPT2 using OpenVINO backend [OpenVINO backend] Adding support for OpenVINO backend & support inference for Mistral & Gemma & GPT2 Aug 3, 2025
@Mohamed-Ashraf273
Copy link
Author

@mattdangerw
@divyashreepathihalli
@fchollet
Could you please take a look?

@Mohamed-Ashraf273 Mohamed-Ashraf273 force-pushed the supporting_gemma_inference_with_ov_backend branch 5 times, most recently from bc0afe5 to fa11864 Compare August 4, 2025 12:31
@Mohamed-Ashraf273 Mohamed-Ashraf273 force-pushed the supporting_gemma_inference_with_ov_backend branch from fa11864 to 8baea81 Compare August 4, 2025 12:31
@Mohamed-Ashraf273 Mohamed-Ashraf273 force-pushed the supporting_gemma_inference_with_ov_backend branch from 4edc62c to c186f33 Compare August 13, 2025 20:03
Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Just a high level pass for now. The biggest thing is thinking of a more maintainable testing plan. More details in comments below.

@@ -48,6 +48,10 @@ jobs:
run: |
pip install -r requirements.txt --progress-bar off
pip install --no-deps -e "." --progress-bar off
if [[ "${{ matrix.backend }}" == "openvino" ]]; then
pip uninstall -y keras
pip install git+https://github.com/keras-team/keras.git --upgrade --force-reinstall --progress-bar off
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do we need from keras that's not on a stable version? we already test against keras-nightly for some tests, we shouldn't directly pull from get as well.

Copy link
Author

@Mohamed-Ashraf273 Mohamed-Ashraf273 Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the keras-stable test for OpenVINO and now run only keras-nightly, as some recently supported operations are only available in the main branch of Keras.

@@ -132,6 +132,83 @@ def make_generate_function(self):
return self.generate_function

self.generate_function = self.generate_step
if keras.config.backend() == "openvino":
import openvino as ov
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally this element in the switch case would be a little less involved. consider making a util file, and trying to break this into components any sub function here that aren't directly related to generation itself (generic open vino utils for compiling). and probably add some testing there as well that only runs when on the open vino backend

Copy link
Author

@Mohamed-Ashraf273 Mohamed-Ashraf273 Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!
I made a separate openvino.utils.py file with tests.

@@ -0,0 +1,24 @@
AnchorGeneratorTest::test_anchor_generator0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file and the next aren't going to be easily maintainable. Two ideas on how to do this better

  • Simple. Just whitelist what we test on open vino for now based on regex selector against test names. So we test everything that looks like test_score_loss or test_early_stopping, etc. Across all models.
  • More complex. Somehow detect and skip any test on an open vino backend that hits fit, if we can find a not-overly-complex way to do this. Make every other test (that does not require gradient descent) pass.

Probably the former is more doable for now. Maybe we do it conftest.py, and have short list of selectors that we are going to be testing with openvino.

Either way, we do not want to add a new checklist item for people developing models to need to remember, and any list like this would be.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -145,6 +145,9 @@ def next(prompt, cache, index):
cache_update_index = index - 1
batch_size = ops.shape(prompt)[0]
prompt = ops.slice(prompt, [0, cache_update_index], [batch_size, 1])
if keras.config.backend() == "openvino":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we fix this in ops.slice in core Keras somehow? Otherwise we'd need to port this to every model doing this slice right (probably basically every generative model). If we can patch the Keras op, let's do that.

If we can't for some reason, we'd probably want to make something like slice_sequence_at_index in tensor_utils.py and include this work around and update every occurrence of this slice call. But updating Keras much simpler (since this does seam like a backend specific issue with open vino)

Copy link
Author

@Mohamed-Ashraf273 Mohamed-Ashraf273 Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!
Solved in this PR: keras-team/keras#21580
Once it’s merged, I’ll remove the openvino check.

@Mohamed-Ashraf273 Mohamed-Ashraf273 force-pushed the supporting_gemma_inference_with_ov_backend branch 5 times, most recently from c8fae97 to cdb947f Compare August 14, 2025 20:48
@Mohamed-Ashraf273
Copy link
Author

Mohamed-Ashraf273 commented Aug 14, 2025

@mattdangerw
I've addressed all your feedback, and the PR is ready for review.
Thanks for your help!

@Mohamed-Ashraf273 Mohamed-Ashraf273 force-pushed the supporting_gemma_inference_with_ov_backend branch 4 times, most recently from ecf6e45 to ea37ac5 Compare August 15, 2025 11:44
@Mohamed-Ashraf273
Copy link
Author

@Mohamed-Ashraf273 Mohamed-Ashraf273 force-pushed the supporting_gemma_inference_with_ov_backend branch 7 times, most recently from 1b742c5 to 110b9c3 Compare August 19, 2025 14:07
@Mohamed-Ashraf273
Copy link
Author

@Mohamed-Ashraf273
Copy link
Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the OpenVINO backend for inference with Gemma, Mistral, and GPT-2 models. The changes are comprehensive, including modifications to the test configuration to selectively skip unsupported tests, adjustments in the CausalLM and Sampler implementations for OpenVINO compatibility, and the addition of a new openvino_utils.py module with helper functions for inference and test management. I've found a couple of areas for improvement. There's a bug in how the test whitelist file is parsed, as it doesn't handle comments correctly. I've also suggested a refactoring in the TrainingMethodDetector class for better performance and code style. Overall, this is a great contribution to extend Keras Hub's backend support.

@Mohamed-Ashraf273 Mohamed-Ashraf273 force-pushed the supporting_gemma_inference_with_ov_backend branch from ef7b03d to 3ed2fea Compare August 19, 2025 17:06
@Mohamed-Ashraf273
Copy link
Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the OpenVINO backend for inference with several models like Gemma, Mistral, and GPT-2. The changes are extensive, including a new OpenVINO utilities module with helpers for model compilation, inference, and test management. The test configuration is updated to intelligently skip tests that are not compatible with the OpenVINO backend. The core model generation logic is also adapted to use the OpenVINO inference path when the backend is active.

My review focuses on improving performance in the test suite and enhancing code quality in the new test files. I've identified an inefficiency in how OpenVINO-specific tests are configured and a code style issue regarding local imports in the new test file. Overall, this is a solid contribution that significantly expands the backend support of Keras Hub.

@Mohamed-Ashraf273 Mohamed-Ashraf273 force-pushed the supporting_gemma_inference_with_ov_backend branch from 3ed2fea to 9bac18f Compare August 19, 2025 17:40
@Mohamed-Ashraf273
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Gemma Gemma model specific issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants