SEAgent

This repository is the official implementation of SEAgent.

SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
Zeyi Sun, Ziyu Liu, Yuhang Zang, Yuhang Cao, Xiaoyi Dong, Tong Wu, Dahua Lin, Jiaqi Wang

📖Paper | 🤗SEAgent-1.0-7B | 🤗World State Model-7B

👨‍💻 Todo

Training code of SEAgent based on OpenRLHF.
Training code of SEAgent based on R1-V.
Task Generation code based on Curriculum Generator.
Inference code of SEAgent on OSWorld.
Inference code of World-State-Model on AgentRewardBench.
Release of SEAgent-1.0-7B.
Release of World-State-Model-1.0-7B.

🛠️ Usage

Installation

conda create -n seagent python=3.11 
conda activate seagent
bash setup.sh

Training

RL: src/r1-v/run_grpo_gui_8_7b.sh

SFT: sft.sh

Inference

# deploy SEAgent-1.0-7B model

vllm serve Zery/SEAgent-1.0-7B \
    --served-model-name "ui-tars-1.0-7b" \
    --host 0.0.0.0 \
    --port "${PORT}" \
    --tensor-parallel-size "${TP}" &

export UI_TARS_1_0_URL=http://YOUR.IP.ADDRESS:PORT/v1
model_name="ui-tars-1.0-7b"
# test on five software only.
python run_multienv_uitars_1_0.py \
    --headless --observation_type screenshot --model ui-tars-1.0-7b \
    --result_dir ./results_en_test_1_0/all_l15_h5/${model_name} --num_envs 8 --sleep_after_execution 2.0 \
    --max_tokens 1000 --top_p 0.9 --temperature 1.0 --max_trajectory_length 15 --history_n 1

OSworld

cd OSWorld

Change IP to vllm deployed UI_TARS or SEAgent.

bash run_multienv_uitars_1_0_full.sh

Sample Curriculum Data.

Generate Task Instructions

cd OSWorld

Deploy World State Model and Curriculum Generator for task generation.

CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve "Qwen/Qwen2.5-72B-Instruct" --served-model-name qwen72b --port 8002 --tensor-parallel-size 4
CUDA_VISIBLE_DEVICES=4,5,6,7 vllm serve "Zery/CUA_World_State_Model" --served-model-name gui_judge --port 8001 --tensor-parallel-size 4

After deploy model for task generation, use task_buffer/task_buffer_update_from_qwen.py to update task.

python task_buffer/task_buffer_update_from_qwen.py \
    --judge_model gui_judge \
    --phase -1 \
    --software vscode \
    --base_result_dir results_en_phase_qwen

Set different software and phase number (-1 for initial phase) to generate curriculumed tasks.

Executing Instructions

Deploy Actor Model (UI-TARS as initialization.)

CUDA_VISIBLE_DEVICES=0 python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --port 8001 \
    --model bytedance-research/UI-TARS-7B-DPO --limit-mm-per-prompt image=5 -tp 1

run run_multienv_uitars_new_traj_evolve.sh to sample trajectories with actor agent.

export VLLM_BASE_URL=http://YOUR.IP.ADDRESS:PORT/v1
for software in vs_code; do
    python run_multienv_uitars_new_traj_evolve.py \
        --headless --observation_type screenshot_a11y_tree --model ui_tars_7b_dpo --test_all_meta_path ./task_buffer/task_buffer_qwen_${software}_phase0.json \
        --result_dir ./results_en_phase0/7b_new_traj_multi_env=8_${software}_new_fix_nimg=1_maxtraj=15_t=0.0_r0 --num_envs 8 \
        --max_tokens 1000 --top_p 0.9 --temperature 1.0 --max_trajectory_length 15 --history_n 1 --software ${software}
done

Judge the trajectories and generate pseudo labeled SFT/RL data with World State Model.

Deploy World State Model via vllm.

CUDA_VISIBLE_DEVICES=4,5,6,7 vllm serve "Zery/CUA_World_State_Model" --served-model-name gui_judge --port 8001 --tensor-parallel-size 4

Use judge_full_process.sh to judge previous generated trajectories.
Use gen_sft.py to generate training data for training process. The 7b_positive_xxx.json can be directly used for SFT (Behavior Cloning) training. The 7b_positive_xxx.json and 7b_negative_xxx.json can be used for RL. You can use visualize_data.py to visualize generated data.

Evaluation of World State Model on AgentRewardBench.

We test our World State Model on AgentRewardBench to test its accuracy for judging the success/failure for agent's trajectories. As it adopts middle states' screenshots for judgment, you need to modify some of the code to input more images with new prompt template. After install AgentRewardBench, replace the scripts/run_judge.py, agent_reward_bench/judge/__init__.py, agent_reward_bench/judge/defaults.py with newly added functions and prompt template.

Use bash run_judge.sh to reproduce our results.

Evaluation of World State Model on OSWorld.

use OSWorld/eval_osworld_bench.py

Acknowledgements

We sincerely thank UI-TARS, OSWorld, R1-V, DeepSeek, Open-R1, QwenVL, for providing open source resources and to build the project.

✒️ Citation

@misc{sun2025seagentselfevolvingcomputeruse,
      title={SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience}, 
      author={Zeyi Sun and Ziyu Liu and Yuhang Zang and Yuhang Cao and Xiaoyi Dong and Tong Wu and Dahua Lin and Jiaqi Wang},
      year={2025},
      eprint={2508.04700},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2508.04700}, 
}

📄 License

Usage and License Notices: The data and code are intended and licensed for research use only. License: Attribution-NonCommercial 4.0 International It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use

Acknowledgement

We sincerely thank projects UI-TARS, OSWorld, AgentRewardBench, R1-V, for providing their open-source resources.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
OSWorld		OSWorld
eval_AgentRewardBench		eval_AgentRewardBench
src		src
.gitignore		.gitignore
README.md		README.md
setup.sh		setup.sh
sft.sh		sft.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SEAgent

👨‍💻 Todo

🛠️ Usage

Installation

Training

Inference

OSworld

Sample Curriculum Data.

Generate Task Instructions

Executing Instructions

Judge the trajectories and generate pseudo labeled SFT/RL data with World State Model.

Evaluation of World State Model on AgentRewardBench.

Evaluation of World State Model on OSWorld.

Acknowledgements

✒️ Citation

📄 License

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

SunzeY/SEAgent

Folders and files

Latest commit

History

Repository files navigation

SEAgent

👨‍💻 Todo

🛠️ Usage

Installation

Training

Inference

OSworld

Sample Curriculum Data.

Generate Task Instructions

Executing Instructions

Judge the trajectories and generate pseudo labeled SFT/RL data with World State Model.

Evaluation of World State Model on AgentRewardBench.

Evaluation of World State Model on OSWorld.

Acknowledgements

✒️ Citation

📄 License

Acknowledgement

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages