Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations

Chenghao Xiao^1,2, Hou Pong Chan^1,#, Hao Zhang^1,#, Mahani Aljunied¹,
Lidong Bing¹, Noura Al Moubayed², Yu Rong¹

¹DAMO Academy, Alibaba Group, ²Durham University

^#Corresponding Authors

🌟This repo contains the code and datasets for the paper "Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations" to appear at ACL 2025.

🎉 Updates

[2025-05] Our paper is accepted by ACL 2025.
[2025-04] Check out our paper on arxiv.

Overview

We present the first study to analyze how LLMs recognize knowledge boundaries across different languages by probing their internal representations when processing known and unknown questions in multiple languages.

Our empirical studies reveal three key findings: 1) LLMs' perceptions of knowledge boundaries are encoded in the middle to middle-upper layers across different languages. 2) Language differences in knowledge boundary perception follow a linear structure, which motivates our proposal of a training-free alignment method that effectively transfers knowledge boundary perception ability across languages, thereby helping reduce hallucination risk in low-resource languages; 3) Fine-tuning on bilingual question pair translation further enhances LLMs' recognition of knowledge boundaries across languages.

Given the absence of standard testbeds for cross-lingual knowledge boundary analysis, we construct a multilingual evaluation suite comprising three representative types of knowledge boundary data.

Evaluation Suite

Links to our datasets: FreshQA-multilingual; FreshQA-multilingual-augmented; True-False-multilingual; SeaRefuse

Inference Code

Code for linear probe, and using mean-shifting & linear projection to align language subspaces.

python inference.py \
    --model_name Qwen/Qwen2.5-7B \
    --dataset_name SeaLLMs/FreshQA-multilingual \
    --output_path "./transferability_results/7B/Qwen_base_7B.json" \
    --methods "identical" "mean shifting" "linear projection" \
    --use_template True \
    --batch_size 50

Citation

@inproceedings{DBLP:conf/acl/XiaoCZABMR25,
  author       = {Chenghao Xiao and
                  Hou Pong Chan and
                  Hao Zhang and
                  Mahani Aljunied and
                  Lidong Bing and
                  Noura Al Moubayed and
                  Yu Rong},
  editor       = {Wanxiang Che and
                  Joyce Nabende and
                  Ekaterina Shutova and
                  Mohammad Taher Pilehvar},
  title        = {Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through
                  the Lens of Internal Representations},
  booktitle    = {Proceedings of the 63rd Annual Meeting of the Association for Computational
                  Linguistics (Volume 1: Long Papers), {ACL} 2025, Vienna, Austria,
                  July 27 - August 1, 2025},
  pages        = {24099--24115},
  publisher    = {Association for Computational Linguistics},
  year         = {2025},
  url          = {https://aclanthology.org/2025.acl-long.1174/},
  timestamp    = {Wed, 24 Sep 2025 15:22:07 +0200},
  biburl       = {https://dblp.org/rec/conf/acl/XiaoCZABMR25.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
linear_structure_visualization.ipynb		linear_structure_visualization.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations

🌟This repo contains the code and datasets for the paper "Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations" to appear at ACL 2025.

🎉 Updates

Overview

Evaluation Suite

Inference Code

Citation

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

DAMO-NLP-SG/LLM-Multilingual-Knowledge-Boundaries

Folders and files

Latest commit

History

Repository files navigation

Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations

🌟This repo contains the code and datasets for the paper "Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations" to appear at ACL 2025.

🎉 Updates

Overview

Evaluation Suite

Inference Code

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages