RoBERTa model conversion does not pass the huggingface test

### System Info

CPU architecture: x86_64
GPU type: NVIDIA A100-SXM4-40GB
CUDA Version: 12.7
Driver Version: 565.57.01

### Who can help?

Hey, @byshiue!
I saw you responding to other Encoder Model related issue, hope you might be the right person for this question.

Thank you!


### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

Hello!
I am trying to use the `TensorRT-LLM/examples/bert` to convert a RoBERTa model (`FacebookAI/roberta-base`) to a TRT-LLM engine. 
I am using `v.0.16.0` tag and am following the instructions from `TensorRT-LLM/examples/bert/README.md`

- First, as a sanity check I verify that converting a BERT model (`google-bert/bert-base-uncased`) passes the `run.py` huggingface comparisson test (with intermediate checks). With the following commands
 
```
CUDA_VISIBLE_DEVICES=0 python /code/tensorrt_llm/examples/bert/convert_checkpoint.py --model=BertModel  --model_dir=google-bert/bert-base-uncased   --output_dir=trt_checkpoints/bert-base

CUDA_VISIBLE_DEVICES=0 trtllm-build --checkpoint_dir trt_checkpoints/bert-base/ --output_dir engines/bert-base --remove_input_padding=disable --max_batch_size=128 --max_seq_len=512  --bert_attention_plugin=disable --context_fmha=disable --enable_debug_output

CUDA_VISIBLE_DEVICES=0 python /code/tensorrt_llm/examples/bert/run.py --engine_dir engines/bert-base/  --hf_model_dir=google-bert/bert-base-uncased --run_hf_test  --debug
```

This results with final hidden outputs as well as intermediate layer outputs passing the `torch.all_close` checks with the huggingface model outputs (as implemented in `run.py`)

- Next, I try to perform the same comparison with a RobertaModel

```
CUDA_VISIBLE_DEVICES=0 python /code/tensorrt_llm/examples/bert/convert_checkpoint.py --model=RobertaModel  --model_dir=FacebookAI/roberta-base --output_dir=trt_checkpoints/roberta-base 

CUDA_VISIBLE_DEVICES=0 trtllm-build --checkpoint_dir trt_checkpoints/roberta-base/ --output_dir engines/roberta-base --remove_input_padding=disable --max_batch_size=128 --max_seq_len=512  --bert_attention_plugin=disable --context_fmha=disable --enable_debug_output

CUDA_VISIBLE_DEVICES=0 python /code/tensorrt_llm/examples/bert/run.py --engine_dir engines/roberta-base/  --hf_model_dir=FacebookAI/roberta-base --run_hf_test  --debug
```

Even though I get a pass on the final check results:`RobertaModel result is all close to HF reference!`, I observe that the intermediate layer outputs do not match, starting from the 4th layer with default tolerance = 1e-2 (or from the 0th encoder layer with a lower tolerance of 1e-3).  Here is the output of the default script:

```
 Embedding are all close                                                                                                       
 BertEncoderLayer_0_output is close: True                                                                                      
 BertEncoderLayer_1_output is close: True                                                                                      
 BertEncoderLayer_2_output is close: True                                                                                      
 BertEncoderLayer_3_output is close: True                                                                                      
 BertEncoderLayer_4_output is close: False                                                                                     
 BertEncoderLayer_5_output is close: True                                                                                      
 BertEncoderLayer_6_output is close: False                                                                                     
 BertEncoderLayer_7_output is close: False                                                                                     
 BertEncoderLayer_8_output is close: False                                                                                     
 BertEncoderLayer_9_output is close: False                                                                                     
 BertEncoderLayer_10_output is close: False     
```

- When I try to use a fine-tuned `roberta-base` checkpoint I encounter both the intermediate and the final checks failing

### Expected behavior

Getting the result of 
`CUDA_VISIBLE_DEVICES=0 python /code/tensorrt_llm/examples/bert/run.py --engine_dir engines/roberta-base/  --hf_model_dir=FacebookAI/roberta-base --run_hf_test  --debug` as:


```
 Embedding are all close                                                                                                       
 BertEncoderLayer_0_output is close: True                                                                                      
 BertEncoderLayer_1_output is close: True                                                                                      
 BertEncoderLayer_2_output is close: True                                                                                      
 BertEncoderLayer_3_output is close: True                                                                                      
 BertEncoderLayer_4_output is close: True                                                                                   
 BertEncoderLayer_5_output is close: True                                                                                      
 BertEncoderLayer_6_output is close: True                                                                                     
 BertEncoderLayer_7_output is close: True                                                                                     
 BertEncoderLayer_8_output is close: True                                                                                   
 BertEncoderLayer_9_output is close: True                                                                                     
 BertEncoderLayer_10_output is close: True     
```

as well as the outputs of the final layer:

```
RobertaModel result is all close to HF reference!
```




### actual behavior

```
 Embedding are all close                                                                                                       
 BertEncoderLayer_0_output is close: True                                                                                      
 BertEncoderLayer_1_output is close: True                                                                                      
 BertEncoderLayer_2_output is close: True                                                                                      
 BertEncoderLayer_3_output is close: True                                                                                      
 BertEncoderLayer_4_output is close: False                                                                                     
 BertEncoderLayer_5_output is close: True                                                                                      
 BertEncoderLayer_6_output is close: False                                                                                     
 BertEncoderLayer_7_output is close: False                                                                                     
 BertEncoderLayer_8_output is close: False                                                                                     
 BertEncoderLayer_9_output is close: False                                                                                     
 BertEncoderLayer_10_output is close: False     
```

### additional notes

Based on the `EmbedderLayer` outputs matching in all cases, I would expect the difference be in the `EncoderLayer`, however based on the huggingface implementations those should be exactly the same for BERT and RoBERTa [modelling_bert.py](https://github.com/huggingface/transformers/blob/v4.49.0/src/transformers/models/bert/modeling_bert.py), [modelling_roberta.py](https://github.com/huggingface/transformers/blob/v4.49.0/src/transformers/models/roberta/modeling_roberta.py#L1671)

I would appreciate any help or pointers on how to debug this and make the RobertaModel work with TensorRT-LLM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RoBERTa model conversion does not pass the huggingface test #2829

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RoBERTa model conversion does not pass the huggingface test #2829

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions