-
Notifications
You must be signed in to change notification settings - Fork 287
Add HGNetV2 to KerasHub #2293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add HGNetV2 to KerasHub #2293
Conversation
@mattdangerw @divyashreepathihalli We should be able to wrap up this PR soon. I’ve made considerable progress on D-FINE that I’m eager to push to GitHub, but the volume is considerable, and I want to avoid making this PR unmanageable. With that in mind, I’ve completed the functionality tests and numerics matching in the Colab Notebook linked in the PR description, and I’ve also written a standalone example from the user’s perspective, as you requested last time along with the numerics matching to wrap up this model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Left some thoughts on the exposed API.
Keep in mind with these a key goal is to be able to hot swap out classifier model for classifier model in a high level task without changing the high level fine-tuning code. I think there's a few spots we can do that better here (in the inline comments).
class HGNetV2Backbone(Backbone): | ||
"""This class represents a Keras Backbone of the HGNetV2 model. | ||
|
||
This class implements an HGNetV2 backbone architecture. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a little more detail here?
This class implements an HGNetV2 (High Performance GPU Net) backbone architecture, a convnet architecture for high performance image classification.
Or something like that, not actually sure what the best brief description would be
stage_mid_channels, | ||
stage_out_channels, | ||
stage_num_blocks, | ||
stage_numb_of_layers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numb_of_layers -> num_layers
use_learnable_affine_block: bool, whether to use learnable affine | ||
transformations. | ||
num_channels: int, the number of channels in the input image. | ||
stage_in_channels: list of ints, the input channels for each stage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could do something more like this? https://github.com/keras-team/keras-hub/blob/master/keras_hub/src/models/xception/xception_backbone.py#L25C9-L28 list of lists here to keep the number of args to this down? what do you think?
I also think we use the term filters more often than channels for args like this
hidden_sizes: list of ints, the sizes of the hidden layers. | ||
stem_channels: list of ints, the channels for the stem part. | ||
hidden_act: str, the activation function for hidden layers. | ||
use_learnable_affine_block: bool, whether to use learnable affine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is used if this is false?
stage_num_blocks: list of ints, the number of blocks in each stage. | ||
stage_numb_of_layers: list of ints, the number of layers in each block. | ||
stage_downsample: list of bools, whether to downsample in each stage. | ||
stage_light_block: list of bools, whether to use light blocks in each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's a light block?
|
||
|
||
@keras_hub_export("keras_hub.models.HGNetV2ImageClassifier") | ||
class HGNetV2ImageClassifier(ImageClassifier): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a docstring, this will be public.
activation=None, | ||
dropout=0.0, | ||
head_dtype=None, | ||
use_learnable_affine_block_head=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just follow the backbone for this? Do we ever want to disagree with the backbone here in practical usage? We could always add later.
backbone, | ||
preprocessor, | ||
num_classes, | ||
head_filters, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we should add a default here. Can we do head_filters=None
and read a value from the backbone that's a good default?
All our other classifiers are instantiatable from preset with a random head by just passing num_classes
. If we require another arg, then you couldn't sub this in for other classifier models easily.
|
||
|
||
@keras_hub_export("keras_hub.layers.HGNetV2ImageConverter") | ||
class HGNetV2ImageConverter(PreprocessingLayer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to base this directly off the main image converter class. It's ok if we don't do the exact same resizing and cropping as upstream, that is something end users will be able to configure anyway (you could always chain the image converted with other keras image preprocessing layers).
mean
and std
will be covered by offset
and scale
in ImageConverter, you just need to convert the scalars. The ResizeThenCrop
we don't support, but we can discuss separately whether that should be part of the image converter, or if we just allow users to do that with ResizeThenCrop
directly.
@@ -0,0 +1,58 @@ | |||
# Metadata for loading pretrained model weights. | |||
backbone_presets = { | |||
"hgnetv2_b4.ssld_stage2_ft_in1k": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have dots in other preset names? i've never seen it. self consistency is more important that consistency with our source for these models, so probably go dot to underscore.
also, please take a look at other preset names in keras-hub and try to be as consistent as possible. i think we just use imagenet
instead of in1k
, for example?
Description of the Change
It is an end-to-end image classification model which had to be implemented on KerasHub as a building block towards supporting D-FINE. A number of D-FINE's presets depend on derivatives of the HGNetV2Backbone, and this model sets any required infrastructure in place to serve them. Its addition unlocks and allows follow-on integration effort toward D-FINE on KerasHub. Concurrently, I am working on exploring the integration paradigm for D-FINE with HGNetV2 layers.
Closes the first half of #2271
Reference
Please read Page 15/18, Section A.1.1 of the D-FINE paper, and the HF config files to verify this point. The
"backbone": null
argument in the HuggingFace configuration translates to an HGNetV2 backbone.Colab Notebook
Usage and Numerics Matching Colab
Checklist