Automated Image Captioning

This project demonstrates an image captioning system built using a CNN-RNN model architecture, designed to generate descriptive captions for images. The model utilizes TensorFlow and Keras, and the app is implemented in Streamlit to provide an interactive user experience.

Project Overview

The goal of this project is to create accurate and meaningful captions for images by using a dual-network approach, combining Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).

Dataset

Flickr8k Dataset
Captions provided as a text file with mappings from image IDs to captions.

Key Components

Feature Extraction: VGG16 extracts feature vectors from images.
Caption Preprocessing: Clean and tokenize captions, adding start and end tokens.
Model Architecture: Combines image and text processing paths with Dense, Embedding, and LSTM layers.
Training: Model is trained with a data generator.
Evaluation: BLEU scores are calculated to evaluate performance.

Dataset Preprocessing

Image Features: The dataset used for training is preprocessed by extracting high-level features from images, leveraging a pre-trained CNN model such as InceptionV3.
Caption Processing: Text captions are tokenized and converted to sequences, and start and end tokens are added for consistency in the generated captions.

Model Architecture

1. CNN Encoder

Uses a pre-trained InceptionV3 model to extract features from images, providing a vectorized representation of visual content.
The extracted features are reshaped to fit the requirements of the RNN decoder.

2. RNN Decoder

The model’s RNN layer, specifically an LSTM, sequentially generates captions based on the CNN-encoded features.
The decoder uses word embeddings to convert tokens to dense vectors, and the generated captions are refined by processing these embeddings.

3. Embedding and Sequence Generation

Word embeddings are used to transform tokens into dense vectors, enabling the model to capture semantic relationships between words.
Sequences of words are generated until an end token is reached, producing a coherent caption.

Training Strategy

Checkpointing: Model checkpoints are saved periodically to allow resuming from the last saved point in case of interruptions.
Data Augmentation: Techniques like resizing and normalization are applied to the input images to improve generalization.
Loss and Metrics: The model is trained with sparse categorical cross-entropy as the loss function, optimizing both accuracy and fluency.

Model Inference

The trained model generates captions for new images by passing the image through the encoder (CNN) and using the decoder (RNN) to generate text sequentially. The system can process images uploaded through the Streamlit interface, displaying both generated and actual captions when available.

Key Features

Interactive User Interface: A Streamlit app for uploading images and generating captions in real-time.
Model Checkpointing: Saves model checkpoints during training to prevent data loss in case of interruptions.
Generated vs. Actual Captions: Displays both generated and actual captions (if available) to assess model performance.

Access the Application

You can access the live application here: Automatic Image Caption App

Future Work

Potential improvements include:

Exploring Transformer Models: Testing Transformer-based architectures to further improve caption quality and capture contextual nuances.
Dataset Expansion: Leveraging larger and more diverse datasets to enhance vocabulary and generalization.
Beam Search for Caption Generation: Implementing beam search during inference for more accurate caption generation.

Conclusion

This project showcases the effectiveness of CNN-RNN architectures in generating descriptive captions for images. The integration of pre-trained image processing models and sequential RNN decoders enables a robust framework for generating meaningful captions that reflect image content accurately.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
RNNmodelkeras_100.keras		RNNmodelkeras_100.keras
image.py		image.py
requirements.txt		requirements.txt
tokenizer.pkl		tokenizer.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Automated Image Captioning

Project Overview

Dataset

Key Components

Dataset Preprocessing

Model Architecture

Training Strategy

Model Inference

Key Features

Access the Application

Future Work

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Zeeshan13/Image-Caption

Folders and files

Latest commit

History

Repository files navigation

Automated Image Captioning

Project Overview

Dataset

Key Components

Dataset Preprocessing

Model Architecture

Training Strategy

Model Inference

Key Features

Access the Application

Future Work

Conclusion

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages