🤖 AppAgent‑Pro

AppAgent-Pro is a proactive GUI AI agent system that goes beyond text-based answers. Built as an enhancement to the original AppAgent, it can proactively interact with Android apps (YouTube, Amazon) via ADB, decide whether external information is needed, and combine retrieved results with LLM-generated responses with exquisite GUI webpage.

Unlike traditional assistants that passively respond with pretrained knowledge, AppAgent-Pro functions as a proactive multimodal mobile agent with real-world execution capabilities. It follows a unified, end-to-end pipeline of:

🧠 Comprehension → Execution → Integration

Comprehension — Upon receiving a user query, GPT-4o is used to generate an initial answer and infer potential sub-tasks. This step leverages LLM reasoning to form a high-level plan. The agent proactively assesses whether external information is needed, determines which app(s) to launch (YouTube, Amazon), and formulates actionable sub-tasks for each.
Execution — It then simulates real human interactions on mobile apps (clicking, swiping, entering text) via ADB, executing the sub-tasks and collecting content such as screenshots, product details, or video metadata.
Integration — Finally, the agent merges the LLM-generated textual answer with the app-acquired content to produce a structured, enriched response — delivering a truly proactive, context-aware output.

🎯 This enables AppAgent-Pro to act not only as a language model, but as a real-world task executor — bridging LLM cognition with interactive app control.

🎬 Strategy Comparison Demo

We demonstrate how AppAgent-Pro handles three different scenarios depending on the complexity of the query and the need for external resources.

✅ Scenario 1: No External App Needed

The query is simple and can be answered entirely using the LLM’s internal knowledge.

🧠 “How many hours are there in one day?” → No sub-tasks needed.

Video Demonstration

none_v1.mp4

✅ Scenario 2: One External App Used

The agent chooses to enhance the answer using one external app (e.g., YouTube).

🎥 “How to upload a video on Youtube?” → Add a YouTube video tutorial.

Video Demonstration

one_v1.mp4

✅ Scenario 3: Two External Apps Used

The agent enhances its response with both YouTube and Amazon.

🐈 “How to keep a cat?” → Add product picture (Amazon) and assembly guide videos (YouTube).

Video Demonstration

both_v1.mp4

⚡ Quick Start

⚙️ Step 1: Prerequisites

Install ADB on your PC.
Enable USB debugging on your Android device (in Developer Options).
Connect your device via USB.
(Optional) No real device? Use the Android Studio emulator and install apps via APK drag-and-drop.

Clone this repo and install dependencies:

cd AppAgent-Pro
pip install -r requirements.txt

⚙️ Step 2: Configure the Agent

Edit config.yaml with:

🔑 openai_api_key: Your OpenAI key with GPT-4o access.
🌐 openai_base: (Optional) Base URL if using a proxy.
⏱️ request_interval: Seconds between API calls.
📜 AUTO_DOC:If set to true, the system will automatically generate a document after each sub-task finishes.

🔍 Step 3: Exploration Phase

AppAgent-Pro needs to understand how to interact with the target apps.

🧑‍🏫 Option 1: Human Demonstration (Recommended)

You show the agent how to operate the app.

python learn.py

🤖 Option 2: Autonomous Exploration

The agent explores app UI on its own.

python learn.py

🚀 Step 4: Deployment Phase

Once exploration is done, launch the demo web app:

streamlit run ./scripts/run_demo.py

AppAgent-Pro will decide which apps to use (if any), generate sub-tasks, and present a unified response including external resources.

📃 License

This project is licensed under the MIT License. See LICENSE for full details.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
learn.py		learn.py
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🤖 AppAgent‑Pro

🧠 Comprehension → Execution → Integration

🎬 Strategy Comparison Demo

✅ Scenario 1: No External App Needed

✅ Scenario 2: One External App Used

✅ Scenario 3: Two External Apps Used

⚡ Quick Start

⚙️ Step 1: Prerequisites

⚙️ Step 2: Configure the Agent

🔍 Step 3: Exploration Phase

🧑‍🏫 Option 1: Human Demonstration (Recommended)

🤖 Option 2: Autonomous Exploration

🚀 Step 4: Deployment Phase

📃 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

LaoKuiZe/AppAgent-Pro

Folders and files

Latest commit

History

Repository files navigation

🤖 AppAgent‑Pro

🧠 Comprehension → Execution → Integration

🎬 Strategy Comparison Demo

✅ Scenario 1: No External App Needed

✅ Scenario 2: One External App Used

✅ Scenario 3: Two External Apps Used

⚡ Quick Start

⚙️ Step 1: Prerequisites

⚙️ Step 2: Configure the Agent

🔍 Step 3: Exploration Phase

🧑‍🏫 Option 1: Human Demonstration (Recommended)

🤖 Option 2: Autonomous Exploration

🚀 Step 4: Deployment Phase

📃 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages