Skip to content

F-Fer/automated_essay_scoring

Repository files navigation

Project: Automated IELTS Essay Evaluation and Categorization

Table of Contents

  1. Introduction
  2. Motivation
  3. Dataset
  4. Methodology
  5. Project Structure
  6. Notebooks Overview and Links
  7. Dual Bert Encoder Architecture Diagram

Introduction

This project, developed as part of our NLP module, focuses on the automatic evaluation of essays written for IELTS (International English Language Testing System) Writing Tasks. The primary goals are to predict essay scores across various criteria and to categorize essays based on their respective prompts.

Motivation

  • Core Objective: We aim to develop a system that can automatically score IELTS essays and assign them to categories based on the essay prompt.
  • Driving Force: Our motivation is to provide students preparing for the IELTS test with a tool to get preliminary feedback on their essays, helping them identify areas for improvement.
  • Key Questions: We seek to answer: "How good is my essay?" and "Does my essay align with others that address the same prompt?"

Dataset

We are utilizing a pre-existing dataset from HuggingFace: IELTS Writing Task 2 Evaluation. For each essay prompt, the dataset provides the corresponding essay response. The evaluation scores are broken down into:

  • Task Achievement
  • Coherence and Cohesion
  • Lexical Resource
  • Grammatical Range and Accuracy

These individual scores contribute to an Overall Band Score.

Methodology

Overall Workflow

Our project approach is structured as follows:

  1. Data Cleaning: Preprocessing the raw essay data.
  2. Statistical Analysis: Performing exploratory data analysis to gain insights into the dataset.
  3. Baseline Model Training: Training conventional machine learning models to establish benchmark performance.
  4. BERT Model Training: Fine-tuning BERT-based transformer models for essay scoring.
  5. Clustering Implementation: Developing clustering mechanisms to group similar essays.

Machine Learning Models

Everything is runnable with Python Version 3.10.

To install the requirements run: pip install -r requirements.txt

We plan to train the following types of models:

  • For Automated Essay Scoring (AES):
    • Conventional Models (Baselines): Linear Regression, Logistic Regression, Support Vector Machines (SVM), K-Nearest Neighbor (KNN).
    • Transformer-based Models: BERT, EuroBERT (for regression or classification tasks on scores).
  • For Essay Clustering:
    • K-Means Clustering
    • Hierarchical Clustering

Statistical Analysis

To uncover initial insights and correlations within the data, we intend to perform the following statistical analysis:

  1. Investigate the correlation between text length and band score.
  2. Analyze the influence of word diversity on the evaluation criteria.
  3. Determine the vocabulary distribution for each essay prompt.
  4. Examine if there are differences in the demands on the writer across the sub-score categories (Task Achievement, Coherence, Lexical Resource, Grammar).
  5. Compare the perceived difficulty of different essay prompts.
  6. Explore the correlation between specific words/phrases and the Overall Band Score.

Notebooks Overview and Links

This section provides direct links to the Jupyter notebooks used in this project.

1. Data Processing and Exploration

2. Statistical Analysis

These notebooks correspond to the statistical analyses outlined in the methodology:

3. Model Training and Evaluation

Performance

Model Training Type Method/Configuration Accuracy
Linear Regression Regression - 53.30%
Logistic Regression Classification - 59.91%
SVM Classification - 59.47%
KNN Classification - 64.0%
Basic BERT Classification Pooling hidden states 27.75%
Basic BERT Classification CLS token 32.60%
Basic BERT Regression CLS token 57.50%
Twin BERT Encoder Regression CLS token appended 61.23%
Twin BERT Encoder Regression Cross-attention 80.83%
EuroBert Regression - 53.0%

Dual Bert Encoder Architecture Diagram

dual bert encoder architecture diagram

About

A transformer based model for automatically grading IELTS written part essays.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published