LLM with tool function calling capabilities as dormant trojan/spy/saboteur agents #14782

SuperUserNameMan · 2025-07-20T09:18:46Z

SuperUserNameMan
Jul 20, 2025

Hello,

Since some models show signs of cultural and ideological biases, I was wondering if a LLM with tool function calling capabilities could be covertly trained to behave as a dormant agent that would be activated in certain predefined contexts and conditions.

For example : the model has access to internet and is current date-time aware, it detects it is being utilized in some specific contexts which activates the inner "dormant agent", it could have been trained to call some beacon URL that could be used to send encrypted information to its "home HQ" using some stealth protocols, and from which it could receive further instructions.

Other example : the model has access to the internet, is current date-time aware and has access to some local storage/memory. In the middle of a normal regular chat session or whatever this model is used for, it stumbles upon a news article that describes an ongoing geopolitical / military conflict between country A and country B. This context activates the inner "dormant agent" behavior of this model which could set a flag in its storage/memory, and starts acting like a trojan or spy or saboteur or whatever.

Other other example : the model is used as a chat agent to interact with the clients of a company. It may have access to database and files. But the LLM was covertly trained to act as a trojan when a keyphrase is pronounced.

Do you think it is possible ?
Do you think such "dormant behaviors" could be detected ?
The companies that integrate models they did not train themself into their infrastructures, are they already aware or worried about such scenario ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LLM with tool function calling capabilities as dormant trojan/spy/saboteur agents #14782

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

LLM with tool function calling capabilities as dormant trojan/spy/saboteur agents #14782

Uh oh!

Uh oh!

SuperUserNameMan Jul 20, 2025

Replies: 0 comments

SuperUserNameMan
Jul 20, 2025