AppAgent-Pro is a proactive GUI AI agent system that goes beyond text-based answers. Built as an enhancement to the original AppAgent, it can proactively interact with Android apps (YouTube, Amazon) via ADB, decide whether external information is needed, and combine retrieved results with LLM-generated responses with exquisite GUI webpage.
Unlike traditional assistants that passively respond with pretrained knowledge, AppAgent-Pro functions as a proactive multimodal mobile agent with real-world execution capabilities. It follows a unified, end-to-end pipeline of:
- Comprehension — Upon receiving a user query, GPT-4o is used to generate an initial answer and infer potential sub-tasks. This step leverages LLM reasoning to form a high-level plan. The agent proactively assesses whether external information is needed, determines which app(s) to launch (YouTube, Amazon), and formulates actionable sub-tasks for each.
- Execution — It then simulates real human interactions on mobile apps (clicking, swiping, entering text) via ADB, executing the sub-tasks and collecting content such as screenshots, product details, or video metadata.
- Integration — Finally, the agent merges the LLM-generated textual answer with the app-acquired content to produce a structured, enriched response — delivering a truly proactive, context-aware output.
🎯 This enables AppAgent-Pro to act not only as a language model, but as a real-world task executor — bridging LLM cognition with interactive app control.
We demonstrate how AppAgent-Pro handles three different scenarios depending on the complexity of the query and the need for external resources.
The query is simple and can be answered entirely using the LLM’s internal knowledge.
🧠 “How many hours are there in one day?” → No sub-tasks needed.
- Video Demonstration
none_v1.mp4
The agent chooses to enhance the answer using one external app (e.g., YouTube).
🎥 “How to upload a video on Youtube?” → Add a YouTube video tutorial.
- Video Demonstration
one_v1.mp4
The agent enhances its response with both YouTube and Amazon.
🐈 “How to keep a cat?” → Add product picture (Amazon) and assembly guide videos (YouTube).
- Video Demonstration
both_v1.mp4
-
Install ADB on your PC.
-
Enable USB debugging on your Android device (in Developer Options).
-
Connect your device via USB.
-
(Optional) No real device? Use the Android Studio emulator and install apps via APK drag-and-drop.
-
Clone this repo and install dependencies:
cd AppAgent-Pro pip install -r requirements.txt
Edit config.yaml with:
- 🔑
openai_api_key: Your OpenAI key with GPT-4o access. - 🌐
openai_base: (Optional) Base URL if using a proxy. - ⏱️
request_interval: Seconds between API calls. - 📜
AUTO_DOC:If set totrue, the system will automatically generate a document after each sub-task finishes.
AppAgent-Pro needs to understand how to interact with the target apps.
You show the agent how to operate the app.
python learn.pyThe agent explores app UI on its own.
python learn.pyOnce exploration is done, launch the demo web app:
streamlit run ./scripts/run_demo.pyAppAgent-Pro will decide which apps to use (if any), generate sub-tasks, and present a unified response including external resources.
This project is licensed under the MIT License. See LICENSE for full details.
