Project: Automated IELTS Essay Evaluation and Categorization

Introduction

This project, developed as part of our NLP module, focuses on the automatic evaluation of essays written for IELTS (International English Language Testing System) Writing Tasks. The primary goals are to predict essay scores across various criteria and to categorize essays based on their respective prompts.

Motivation

Core Objective: We aim to develop a system that can automatically score IELTS essays and assign them to categories based on the essay prompt.
Driving Force: Our motivation is to provide students preparing for the IELTS test with a tool to get preliminary feedback on their essays, helping them identify areas for improvement.
Key Questions: We seek to answer: "How good is my essay?" and "Does my essay align with others that address the same prompt?"

Dataset

We are utilizing a pre-existing dataset from HuggingFace: IELTS Writing Task 2 Evaluation. For each essay prompt, the dataset provides the corresponding essay response. The evaluation scores are broken down into:

Task Achievement
Coherence and Cohesion
Lexical Resource
Grammatical Range and Accuracy

These individual scores contribute to an Overall Band Score.

Methodology

Overall Workflow

Our project approach is structured as follows:

Data Cleaning: Preprocessing the raw essay data.
Statistical Analysis: Performing exploratory data analysis to gain insights into the dataset.
Baseline Model Training: Training conventional machine learning models to establish benchmark performance.
BERT Model Training: Fine-tuning BERT-based transformer models for essay scoring.
Clustering Implementation: Developing clustering mechanisms to group similar essays.

Machine Learning Models

Everything is runnable with Python Version 3.10.

To install the requirements run: pip install -r requirements.txt

We plan to train the following types of models:

For Automated Essay Scoring (AES):
- Conventional Models (Baselines): Linear Regression, Logistic Regression, Support Vector Machines (SVM), K-Nearest Neighbor (KNN).
- Transformer-based Models: BERT, EuroBERT (for regression or classification tasks on scores).
For Essay Clustering:
- K-Means Clustering
- Hierarchical Clustering

Statistical Analysis

To uncover initial insights and correlations within the data, we intend to perform the following statistical analysis:

Investigate the correlation between text length and band score.
Analyze the influence of word diversity on the evaluation criteria.
Determine the vocabulary distribution for each essay prompt.
Examine if there are differences in the demands on the writer across the sub-score categories (Task Achievement, Coherence, Lexical Resource, Grammar).
Compare the perceived difficulty of different essay prompts.
Explore the correlation between specific words/phrases and the Overall Band Score.

Notebooks Overview and Links

This section provides direct links to the Jupyter notebooks used in this project.

1. Data Processing and Exploration

Data Cleaning - Initial data loading, cleaning, and preprocessing.
Data Exploration - General exploration of the dataset features.
Prompt Binning/Categorization - Analysis and grouping of essay prompts.

2. Statistical Analysis

These notebooks correspond to the statistical analyses outlined in the methodology:

3. Model Training and Evaluation

Linear Regression - Baseline regression model
Logistic Regression - Baseline classification model
Support Vector Machines (SVM) - Baseline classification model
K-Nearest Neighbors (KNN) - Baseline classification model
BERT - Multiple BERT model architectures
EuroBERT - BERT Model with larger context window
K Means Clustering - Clustering of the essays and comparison to the clustered prompts

Performance

Model	Training Type	Method/Configuration	Accuracy
Linear Regression	Regression	-	53.30%
Logistic Regression	Classification	-	59.91%
SVM	Classification	-	59.47%
KNN	Classification	-	64.0%
Basic BERT	Classification	Pooling hidden states	27.75%
Basic BERT	Classification	CLS token	32.60%
Basic BERT	Regression	CLS token	57.50%
Twin BERT Encoder	Regression	CLS token appended	61.23%
Twin BERT Encoder	Regression	Cross-attention	80.83%
EuroBert	Regression	-	53.0%

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
embeddings		embeddings
notebooks		notebooks
.gitignore		.gitignore
ProjectSheet.md		ProjectSheet.md
ReadMe.md		ReadMe.md
Team-10.pptx		Team-10.pptx
dual-bert-encoder-cross-attention-architecture.png		dual-bert-encoder-cross-attention-architecture.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project: Automated IELTS Essay Evaluation and Categorization

Table of Contents

Introduction

Motivation

Dataset

Methodology

Overall Workflow

Machine Learning Models

Statistical Analysis

Notebooks Overview and Links

1. Data Processing and Exploration

2. Statistical Analysis

3. Model Training and Evaluation

Performance

Dual Bert Encoder Architecture Diagram

About

Uh oh!

Releases

Packages

Languages

F-Fer/automated_essay_scoring

Folders and files

Latest commit

History

Repository files navigation

Project: Automated IELTS Essay Evaluation and Categorization

Table of Contents

Introduction

Motivation

Dataset

Methodology

Overall Workflow

Machine Learning Models

Statistical Analysis

Notebooks Overview and Links

1. Data Processing and Exploration

2. Statistical Analysis

3. Model Training and Evaluation

Performance

Dual Bert Encoder Architecture Diagram

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages