Open
Description
Running the example shows inconsistency in number of parameters and model performance compared to what is displayed.
It seems that the global average pooling should take data_format to "channel_first" to reach the same number of parameters and the accuracy performance consistent with the displayed console log (tried with google colab).
But then there is no pooling, just removing the feature dimension => Maybe another layer should be used.