Skip to content

Add HGNetV2 to KerasHub #2293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

harshaljanjani
Copy link
Collaborator

Description of the Change

It is an end-to-end image classification model which had to be implemented on KerasHub as a building block towards supporting D-FINE. A number of D-FINE's presets depend on derivatives of the HGNetV2Backbone, and this model sets any required infrastructure in place to serve them. Its addition unlocks and allows follow-on integration effort toward D-FINE on KerasHub. Concurrently, I am working on exploring the integration paradigm for D-FINE with HGNetV2 layers.

Closes the first half of #2271

Reference

Please read Page 15/18, Section A.1.1 of the D-FINE paper, and the HF config files to verify this point. The "backbone": null argument in the HuggingFace configuration translates to an HGNetV2 backbone.

Colab Notebook

Usage and Numerics Matching Colab

Checklist

  • I have added all the necessary unit tests for my change.
  • I have verified that my change does not break existing code and works with all backends (TensorFlow, JAX, and PyTorch).
  • My PR is based on the latest changes of the main branch (if unsure, rebase the code).
  • I have followed the Keras Hub Model contribution guidelines in making these changes.
  • I have followed the Keras Hub API design guidelines in making these changes.
  • I have signed the Contributor License Agreement.

@harshaljanjani harshaljanjani self-assigned this Jun 9, 2025
@harshaljanjani harshaljanjani marked this pull request as ready for review June 12, 2025 15:08
@harshaljanjani
Copy link
Collaborator Author

@mattdangerw @divyashreepathihalli We should be able to wrap up this PR soon. I’ve made considerable progress on D-FINE that I’m eager to push to GitHub, but the volume is considerable, and I want to avoid making this PR unmanageable. With that in mind, I’ve completed the functionality tests and numerics matching in the Colab Notebook linked in the PR description, and I’ve also written a standalone example from the user’s perspective, as you requested last time along with the numerics matching to wrap up this model.
I do have a few nits and some bugs to fix around tolerance in D-FINE, but nothing that affects the current functionality of HGNetV2. Additionally, I’ve written the weight conversion script for D-FINE, and I haven’t needed to make any changes to the HGNetV2 code I've developed here, it’s fully compatible with the D-FINE code, so I hope we’re good to go, thanks!

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Left some thoughts on the exposed API.

Keep in mind with these a key goal is to be able to hot swap out classifier model for classifier model in a high level task without changing the high level fine-tuning code. I think there's a few spots we can do that better here (in the inline comments).

class HGNetV2Backbone(Backbone):
"""This class represents a Keras Backbone of the HGNetV2 model.

This class implements an HGNetV2 backbone architecture.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a little more detail here?

This class implements an HGNetV2 (High Performance GPU Net) backbone architecture, a convnet architecture for high performance image classification.

Or something like that, not actually sure what the best brief description would be

stage_mid_channels,
stage_out_channels,
stage_num_blocks,
stage_numb_of_layers,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numb_of_layers -> num_layers

use_learnable_affine_block: bool, whether to use learnable affine
transformations.
num_channels: int, the number of channels in the input image.
stage_in_channels: list of ints, the input channels for each stage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could do something more like this? https://github.com/keras-team/keras-hub/blob/master/keras_hub/src/models/xception/xception_backbone.py#L25C9-L28 list of lists here to keep the number of args to this down? what do you think?

I also think we use the term filters more often than channels for args like this

hidden_sizes: list of ints, the sizes of the hidden layers.
stem_channels: list of ints, the channels for the stem part.
hidden_act: str, the activation function for hidden layers.
use_learnable_affine_block: bool, whether to use learnable affine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is used if this is false?

stage_num_blocks: list of ints, the number of blocks in each stage.
stage_numb_of_layers: list of ints, the number of layers in each block.
stage_downsample: list of bools, whether to downsample in each stage.
stage_light_block: list of bools, whether to use light blocks in each
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's a light block?



@keras_hub_export("keras_hub.models.HGNetV2ImageClassifier")
class HGNetV2ImageClassifier(ImageClassifier):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a docstring, this will be public.

activation=None,
dropout=0.0,
head_dtype=None,
use_learnable_affine_block_head=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just follow the backbone for this? Do we ever want to disagree with the backbone here in practical usage? We could always add later.

backbone,
preprocessor,
num_classes,
head_filters,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we should add a default here. Can we do head_filters=None and read a value from the backbone that's a good default?

All our other classifiers are instantiatable from preset with a random head by just passing num_classes. If we require another arg, then you couldn't sub this in for other classifier models easily.



@keras_hub_export("keras_hub.layers.HGNetV2ImageConverter")
class HGNetV2ImageConverter(PreprocessingLayer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to base this directly off the main image converter class. It's ok if we don't do the exact same resizing and cropping as upstream, that is something end users will be able to configure anyway (you could always chain the image converted with other keras image preprocessing layers).

mean and std will be covered by offset and scale in ImageConverter, you just need to convert the scalars. The ResizeThenCrop we don't support, but we can discuss separately whether that should be part of the image converter, or if we just allow users to do that with ResizeThenCrop directly.

@@ -0,0 +1,58 @@
# Metadata for loading pretrained model weights.
backbone_presets = {
"hgnetv2_b4.ssld_stage2_ft_in1k": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have dots in other preset names? i've never seen it. self consistency is more important that consistency with our source for these models, so probably go dot to underscore.

also, please take a look at other preset names in keras-hub and try to be as consistent as possible. i think we just use imagenet instead of in1k, for example?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants