support multiple base models #22

rootfs · 2025-07-29T23:52:28Z

Tested with modernbert-base and large, deberta-v3, and minilm

Signed-off-by: Huamin Chen <[email protected]>

Copilot

Pull Request Overview

This PR expands the codebase to support multiple base models for PII classification and multitask BERT fine-tuning. Previously limited to MiniLM, the system now supports various BERT variants including ModernBERT, DeBERTa v3, RoBERTa, ELECTRA, and others.

Adds configurable model support with command-line interface for model selection
Implements GPU device detection and optimized batch sizing per model
Introduces comprehensive test framework for model accuracy evaluation

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
pii_model_fine_tuning/requirements.txt	New dependencies for multiple model support including transformers, torch, and tiktoken
pii_model_fine_tuning/pii_bert_finetuning.py	Core refactoring to support multiple models with device detection and tokenizer fallbacks
multitask_bert_fine_tuning/multitask_bert_training.py	Extended multitask training with model configurations and progress tracking
multitask_bert_fine_tuning/multitask_accuracy_test.py	New accuracy testing framework for validating model performance across tasks

Comments suppressed due to low confidence (3)

pii_model_fine_tuning/requirements.txt:1

The version 2.6.0+cu124 appears to be non-existent. PyTorch 2.6.0 has not been released as of my knowledge cutoff. Consider using a stable version like torch>=2.4.0+cu124 or torch>=2.3.0+cu124.

torch>=2.6.0+cu124

pii_model_fine_tuning/requirements.txt:13

Protobuf version 6.0.0 does not exist. The latest stable versions are in the 4.x or 5.x series. Consider using protobuf>=4.21.0 or protobuf>=5.26.0.

protobuf>=6.0.0

Copilot · 2025-07-30T15:57:32Z

multitask_bert_fine_tuning/multitask_bert_training.py

+    # Auto-optimize batch size based on model if using default
+    if batch_size == 0:  # Default batch size


The condition if batch_size == 0 will never be true because the argparse default is 0, but the parameter default is 16. This logic should check against the actual argparse default or restructure the condition.

Suggested change

# Auto-optimize batch size based on model if using default

if batch_size == 0: # Default batch size

# Auto-optimize batch size based on model if batch_size is set to 0 (default)

if batch_size == 0:

Copilot · 2025-07-30T15:57:32Z

multitask_bert_fine_tuning/multitask_bert_training.py

+        }
+
+        optimized_batch_size = optimal_batch_sizes.get(model_name, 12)
+        if optimized_batch_size != batch_size:


The condition if optimized_batch_size != batch_size will always be true when batch_size is 0 (the argparse default), making the optimization message misleading. The logic should compare against the intended default value.

Suggested change

if optimized_batch_size != batch_size:

if batch_size != 0 and optimized_batch_size != batch_size:

Signed-off-by: Huamin Chen <[email protected]>

rootfs requested a review from Copilot July 29, 2025 23:52

This comment was marked as outdated.

Sign in to view

rootfs force-pushed the multi-base branch from 8e62f16 to 5fac7d5 Compare July 30, 2025 01:14

rootfs and others added 2 commits July 29, 2025 21:14

support multiple base models

5fac7d5

Signed-off-by: Huamin Chen <[email protected]>

support multiple base models

4bc9e7e

Signed-off-by: Huamin Chen <[email protected]>

rootfs requested a review from Copilot July 30, 2025 15:56

Copilot AI reviewed Jul 30, 2025

View reviewed changes

add different model training configs

62038e8

Signed-off-by: Huamin Chen <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support multiple base models #22

support multiple base models #22

Uh oh!

rootfs commented Jul 29, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 30, 2025

Uh oh!

Copilot AI Jul 30, 2025

Uh oh!

Uh oh!

		# Auto-optimize batch size based on model if using default
		if batch_size == 0: # Default batch size

	if optimized_batch_size != batch_size:
	if batch_size != 0 and optimized_batch_size != batch_size:

support multiple base models #22

Are you sure you want to change the base?

support multiple base models #22

Uh oh!

Conversation

rootfs commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rootfs commented Jul 29, 2025 •

edited

Loading