Skip to content
This repository was archived by the owner on Aug 30, 2025. It is now read-only.

Conversation

rootfs
Copy link
Member

@rootfs rootfs commented Jul 29, 2025

Tested with modernbert-base and large, deberta-v3, and minilm

@rootfs rootfs requested a review from Copilot July 29, 2025 23:52
Copilot

This comment was marked as outdated.

rootfs and others added 2 commits July 29, 2025 21:14
@rootfs rootfs requested a review from Copilot July 30, 2025 15:56
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR expands the codebase to support multiple base models for PII classification and multitask BERT fine-tuning. Previously limited to MiniLM, the system now supports various BERT variants including ModernBERT, DeBERTa v3, RoBERTa, ELECTRA, and others.

  • Adds configurable model support with command-line interface for model selection
  • Implements GPU device detection and optimized batch sizing per model
  • Introduces comprehensive test framework for model accuracy evaluation

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
pii_model_fine_tuning/requirements.txt New dependencies for multiple model support including transformers, torch, and tiktoken
pii_model_fine_tuning/pii_bert_finetuning.py Core refactoring to support multiple models with device detection and tokenizer fallbacks
multitask_bert_fine_tuning/multitask_bert_training.py Extended multitask training with model configurations and progress tracking
multitask_bert_fine_tuning/multitask_accuracy_test.py New accuracy testing framework for validating model performance across tasks
Comments suppressed due to low confidence (3)

pii_model_fine_tuning/requirements.txt:1

  • The version 2.6.0+cu124 appears to be non-existent. PyTorch 2.6.0 has not been released as of my knowledge cutoff. Consider using a stable version like torch>=2.4.0+cu124 or torch>=2.3.0+cu124.
torch>=2.6.0+cu124

pii_model_fine_tuning/requirements.txt:13

  • Protobuf version 6.0.0 does not exist. The latest stable versions are in the 4.x or 5.x series. Consider using protobuf>=4.21.0 or protobuf>=5.26.0.
protobuf>=6.0.0

Comment on lines +536 to +537
# Auto-optimize batch size based on model if using default
if batch_size == 0: # Default batch size
Copy link
Preview

Copilot AI Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition if batch_size == 0 will never be true because the argparse default is 0, but the parameter default is 16. This logic should check against the actual argparse default or restructure the condition.

Suggested change
# Auto-optimize batch size based on model if using default
if batch_size == 0: # Default batch size
# Auto-optimize batch size based on model if batch_size is set to 0 (default)
if batch_size == 0:

Copilot uses AI. Check for mistakes.

}

optimized_batch_size = optimal_batch_sizes.get(model_name, 12)
if optimized_batch_size != batch_size:
Copy link
Preview

Copilot AI Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition if optimized_batch_size != batch_size will always be true when batch_size is 0 (the argparse default), making the optimization message misleading. The logic should compare against the intended default value.

Suggested change
if optimized_batch_size != batch_size:
if batch_size != 0 and optimized_batch_size != batch_size:

Copilot uses AI. Check for mistakes.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant