This README explains how we achieved strong performance under the Kaggle competition constraint of a maximum of 30 layers, and how to use and extend this notebook efficiently.
- Task: Multiclass ship type classification from 32×32 RGB images (13 classes).
- Compute: Kaggle TPU/GPU/CPU with automatic fallback.
- Key constraint: Maximum of 30 layers in the model.
- Compact but expressive CNN under 30 layers
- Three convolutional blocks with BatchNorm and Dropout to stabilize and regularize training.
- Wide filters (base 256) to increase representational capacity without adding depth.
- Final Dense head with BatchNorm + Dropout before softmax.
- L2 weight decay, label smoothing, gradient clipping for extra stability.
- Progressive training using multiple pre-generated datasets
- Start with a lightly augmented dataset ("base"), then train on stronger augmentations ("mild", "strong").
- Preserves useful features early, then improves robustness and generalization in later phases.
- Balanced data via targeted augmentation
- For each class, upsample with on-the-fly Keras preprocessing layers to reach a target count per class (median/max strategies).
- Scale dataset size (scale_factor) per augmentation setting to control training signal and batch diversity.
- Careful training control
- Early stopping with patience and ReduceLROnPlateau to converge without overfitting.
- Checkpoint the best model by validation accuracy.
- Stratified validation split at each phase for fair evaluation.
- Efficient input pipeline
- Batch size scaled for TPU replicas when available; AUTOTUNE prefetching.
- Memory checks to keep multiple augmented datasets feasible.
We use a Sequential CNN designed to stay comfortably under the 30-layer cap while maintaining capacity:
- Block 1: [Conv, BN, Conv, BN, MaxPool, Dropout]
- Block 2: [Conv, BN, Conv, BN, MaxPool, Dropout]
- Block 3: [Conv, BN, Conv, BN, MaxPool, Dropout]
- Head: [Flatten, Dense(256), BN, Dropout, Dense(13, softmax)]
Key hyperparameters
- Base filters: 256, doubling per block (256 -> 512 -> 1024 effective conv widths across blocks).
- Regularization: L2 2e-5; Dropout 0.2 in conv blocks (0.4 after block 3), 0.5 before final layer.
- Loss: Categorical cross-entropy with label_smoothing=0.1.
- Optimizer: AdamW (lr 1e-3 initially, weight_decay 2e-4, clipnorm 1.0). LRs adjusted per phase.
Why this works under depth limits
- For 32×32 inputs, depth beyond ~25 layers often yields diminishing returns; width + BN + regularization is more impactful.
- Label smoothing and WD improve calibration and robustness, especially with strong augmentations.
Phases (example used):
- Phase 1 — dataset: "base", epochs: 50, lr: 0.001
- Phase 2 — dataset: "mild", epochs: 100, lr: 0.005
- Phase 3 — dataset: "strong", epochs: 100, lr: 0.005
At each phase:
- Stratified 80/20 train/val split for the current dataset.
- EarlyStopping (patience 20) + ReduceLROnPlateau (factor 0.5, patience 5).
- Checkpoint best weights by val_accuracy to avoid overfitting regressions.
Rationale
- Start easy (less augmentation) to learn stable features quickly.
- Increase augmentation strength to improve invariance and generalization.
- Reset LR per phase to re-accelerate learning on the new distribution.
- Source: ships32 folder extracted from Kaggle dataset archive.
- Loading: keras.utils.image_dataset_from_directory (shuffled, batch_size adapted to hardware).
- Normalization: x / 255.0 when assembling normalized datasets.
- Class balancing: For each class, target a per-class count derived from median or max of class frequencies, then synthesize missing samples with:
- RandomFlip (horizontal/vertical), RandomRotation, RandomZoom, RandomTranslation
- Optional: RandomContrast, RandomBrightness, GaussianNoise
- Dataset scaling: scale_factor per augmentation config (e.g., 1.0, 1.4, 1.1) to tune total training signal.
Tips
- Keep augmentations realistic for small 32×32 images; too strong transforms can erase discriminative signals.
- Use moderate translation/rotation/zoom and enable flips if class semantics allow it.
- Stratified split (20% val) each phase, using label indices for splitting.
- Monitored metrics: val_accuracy (primary), val_loss (secondary).
- Plot training curves with clear phase boundaries to diagnose transitions.
- Set numpy and tensorflow seeds before data generation and training.
- Log the augmentation configs and per-phase LR/epochs.
- Pin the final best model (best_model.keras) and the exact seed producing it.
- Keep the same stratified split strategy for comparable validation numbers.
- Lucas Duport [email protected]
- Flavien Geoffray [email protected]