BioThink: Self-Reflective Reasoning for Biomedical QA

Introduction

Recent Large Language Models (LLMs) have achieved remarkable success in a wide range of tasks, including question answering, text generation, and reasoning. However, these LLMs often struggle with domain-specific tasks, such as biomedical question answering, without extensive pre-training on domain-specific data.

Inspired by Self-RAG and building upon Self-BioRAG, we introduce BioThink, a framework that enhances LLMs for biomedical question answering through self-reflection, context grading, relevance assessment, and utility rating. BioThink uses a novel training approach with GRPO (Group Relative Policy Optimization) to fine-tune LLMs to generate structured outputs that include step-by-step reasoning, concise answers, and self-reflection tokens.

Key Features

Self-Reflective Generation: BioThink generates outputs in a structured format that includes:
- Step-by-step reasoning (<think>)
- Concise answer (<answer>)
- Contextual relevance assessment (<contextual-relevance>)
- Answer utility rating (<answer-utility>)
- Groundness evaluation (<groundness>)
Training with GRPO: We use Group Relative Policy Optimization (GRPO) to train the model, incorporating multiple reward functions to ensure:
- Correctness of the answer
- Accuracy of self-reflection tokens (utility, relevance, groundness)
- Proper XML structure and order of tags
- Faithfulness and relevancy of the answer
Efficiency: The model is trained using QLoRA and Unsloth for efficient fine-tuning.

Training Steps

1. Data Processing

The Self-BioRAG dataset is processed using the script process_data.py. This script extracts questions, answers, and context, and also prepares labels for groundness, relevance, and utility tokens. The processed dataset is available at avnlp/self_biorag_processed.

2. Model Training

The model is trained using the script train_rag.py. The training process involves:

Structured Generation: The model is trained to generate outputs in the following format:

<think>
... step-by-step reasoning ...
</think>
<answer>
... concise answer ...
</answer>
<contextual-relevance>
[Relevant] or [Irrelevant]
</contextual-relevance>
<answer-utility>
[Utility:5] or [Utility:4] or ... [Utility:1]
</answer-utility>
<groundness>
[Fully supported] or [Partially supported] or [No support/Contradictory]
</groundness>

Reward Functions: The training uses GRPO with the following rewards:

Correctness Reward: Measures answer correctness using DeepEval's GEval metric with a custom LLM-as-a-Judge instruction tailored for Bio-Medical Question Answering.
Utility Reward: Ensures the correct Utility token is generated.
Relevance Reward: Ensures the correct Relevance token is generated.
Groundness Reward: Ensures the correct Groundness token is generated.
XML Structure Reward: Checks for the presence and proper opening/closing of all required tags.
Structure Order Reward: Ensures the tags appear in the correct order and that no extra text is present outside the tags.

3. Model

We fine-tune the Qwen-3-1.7B model using GRPO and QLoRA. The trained model is available on Hugging Face: avnlp/BioThink-Qwen3-1.7B.

4. Evaluation

The model is evaluated using the following metrics:

XML Structure: Checks for the presence of the opening and closing of all reasoning, answer, contextual-relevance, answer-utility, groundness tags.
Utility: Checks that the correct utility token has been generated.
Relevance: Checks that the correct relevance token has been generated.
Groundness: Checks that the correct groundness token has been generated.
Answer Correctness: Checks that the answer is correct using DeepEval's GEval metric with a custom instruction for LLM-as-a-Judge.
Faithfulness: Checks that the answer is faithful to the provided context using DeepEval's Faithfulness LLM-as-a-Judge metric.
Answer Relevancy: Checks that the answer is relevant to the original question using DeepEval's Answer Relevancy LLM-as-a-Judge metric.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
data		data
src/biothink		src/biothink
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BioThink: Self-Reflective Reasoning for Biomedical QA

Introduction

Key Features

Training Steps

1. Data Processing

2. Model Training

3. Model

4. Evaluation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

avnlp/biothink

Folders and files

Latest commit

History

Repository files navigation

BioThink: Self-Reflective Reasoning for Biomedical QA

Introduction

Key Features

Training Steps

1. Data Processing

2. Model Training

3. Model

4. Evaluation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages