Natural Language Processing Introduction - QuantLet Collection

ASE Summer School 2025

A comprehensive BSc-level introduction to Natural Language Processing through interactive Jupyter notebooks. This collection provides a progressive learning journey from classical N-gram models to modern transformer architectures, all with minimal mathematical complexity and maximum intuitive understanding.

Course Structure

Module 1: NLP_Ngrams

Simple N-gram Models

Character and word-level N-gram construction
Probabilistic text generation
Count-based language modeling
Visualization of N-gram frequencies

Module 2: NLP_Embeddings

Word Embeddings and Vector Spaces

Word2Vec implementation
Semantic relationships in vector space
3D visualization of word embeddings
Similarity calculations and clustering

Module 3: NLP_Neural

Simple Neural Networks for NLP

2-layer neural network implementation
Backpropagation for language modeling
Training visualization and loss curves
Perplexity metrics

Module 4: NLP_Compare

Comparing NLP Methods

Performance benchmarks across approaches
Perplexity comparisons
Generation quality analysis
Computational requirements

Module 5: NLP_TokenJourney

Token's Journey Through a Transformer

Step-by-step token processing
Embedding and positional encoding
Multi-head attention visualization
Layer normalization effects

Module 6: NLP_Transformers3D

Transformers in 3D: Visual Journey

3-dimensional geometric interpretation
Attention as angles in 3D space
Layer normalization as sphere projection
Interactive 3D visualizations

Module 7: NLP_TransformersSimple

Simplified Transformer Implementation

Minimal, educational implementation
Core components breakdown
Self-attention mechanism
Feed-forward networks

Module 8: NLP_TransformersTraining

How Transformers Learn

Training process visualization in 3D
Gradient flow and weight updates
Loss landscape navigation
Attention pattern evolution

Key Features

Progressive Learning Path: Each module builds on previous concepts
Interactive Visualizations: All notebooks include interactive plots and animations
Minimal Mathematics: Focus on intuition over formulas
Self-Contained: Each notebook can run independently
Shakespeare Dataset: Classic text for all examples
3D Visualizations: Unique geometric interpretations for deep understanding

Technical Requirements

# Core dependencies
numpy
matplotlib
plotly
tensorflow
scikit-learn
gensim
pandas

Dataset

All modules use Shakespeare's sonnets (shakespeare_sonnets.txt) as the primary dataset, providing:

Rich vocabulary (~5000 unique words)
Poetic structure for interesting patterns
Cultural familiarity
Sufficient size for meaningful models

Learning Objectives

By completing these modules, students will understand:

Evolution from count-based to neural language models
How word embeddings capture semantic meaning
Transformer architecture components and their functions
Attention mechanisms and their geometric interpretation
Training dynamics of modern language models

Usage

Each notebook is designed to be run sequentially within its module. Start with Module 1 (NLP_Ngrams) and progress through to Module 8 (NLP_TransformersTraining) for the complete learning experience.

# Example: Running the first module
cd NLP_Ngrams
jupyter notebook 1_simple_ngrams.ipynb

Educational Philosophy

These materials emphasize:

Visual Learning: Complex concepts through visualizations
Hands-On Experience: Interactive code examples
Intuitive Understanding: Geometric and visual interpretations
Practical Implementation: Working code over theory

Author

Joerg Osterrieder
ASE Summer School 2025

License

Educational use permitted. Please cite when using these materials.

Acknowledgments

Created for the ASE Summer School 2025, these materials represent a modern approach to teaching NLP concepts through visualization and interaction rather than mathematical formalism.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
NLP_Compare		NLP_Compare
NLP_Embeddings		NLP_Embeddings
NLP_Neural		NLP_Neural
NLP_Ngrams		NLP_Ngrams
NLP_TokenJourney		NLP_TokenJourney
NLP_Transformers3D		NLP_Transformers3D
NLP_TransformersSimple		NLP_TransformersSimple
NLP_TransformersTraining		NLP_TransformersTraining
CREATE_TOKEN_INSTRUCTIONS.txt		CREATE_TOKEN_INSTRUCTIONS.txt
FINAL_CHECKLIST.md		FINAL_CHECKLIST.md
LICENSE		LICENSE
QUANTLET_SUBMISSION.md		QUANTLET_SUBMISSION.md
README.md		README.md
SUBMIT_TO_QUANTLET.md		SUBMIT_TO_QUANTLET.md
Submit-ToQuantLet.ps1		Submit-ToQuantLet.ps1
auto_submit.bat		auto_submit.bat
browser_submit.bat		browser_submit.bat
environment.yml		environment.yml
push_now.bat		push_now.bat
push_to_github.bat		push_to_github.bat
quantlet_auto_submit.py		quantlet_auto_submit.py
quantlet_issue_template.md		quantlet_issue_template.md
quick_submit.bat		quick_submit.bat
requirements.txt		requirements.txt
run_with_token.bat		run_with_token.bat
shakespeare_sonnets.txt		shakespeare_sonnets.txt
shakespeare_utils.py		shakespeare_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Natural Language Processing Introduction - QuantLet Collection

ASE Summer School 2025

Course Structure

Module 1: NLP_Ngrams

Module 2: NLP_Embeddings

Module 3: NLP_Neural

Module 4: NLP_Compare

Module 5: NLP_TokenJourney

Module 6: NLP_Transformers3D

Module 7: NLP_TransformersSimple

Module 8: NLP_TransformersTraining

Key Features

Technical Requirements

Dataset

Learning Objectives

Usage

Educational Philosophy

Author

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

QuantLet/QuantLet_NLP_Introduction

Folders and files

Latest commit

History

Repository files navigation

Natural Language Processing Introduction - QuantLet Collection

ASE Summer School 2025

Course Structure

Module 1: NLP_Ngrams

Module 2: NLP_Embeddings

Module 3: NLP_Neural

Module 4: NLP_Compare

Module 5: NLP_TokenJourney

Module 6: NLP_Transformers3D

Module 7: NLP_TransformersSimple

Module 8: NLP_TransformersTraining

Key Features

Technical Requirements

Dataset

Learning Objectives

Usage

Educational Philosophy

Author

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages