showlab · kaplangoz · Aug 22, 2025
diff --git a/README.md b/README.md
@@ -16,13 +16,21 @@
 </h5>
 
 ## <img src="./assets/ootb_icon.png" alt="Star" style="height:25px; vertical-align:middle; filter: invert(1) brightness(2);">  Overview
-**Computer Use <span style="color:rgb(106, 158, 210)">O</span><span style="color:rgb(111, 163, 82)">O</span><span style="color:rgb(209, 100, 94)">T</span><span style="color:rgb(238, 171, 106)">B</span>**<img src="./assets/ootb_icon.png" alt="Star" style="height:20px; vertical-align:middle; filter: invert(1) brightness(2);"> is an out-of-the-box (OOTB) solution for Desktop GUI Agent, including API-based (**Claude 3.5 Computer Use**) and locally-running models (**<span style="color:rgb(106, 158, 210)">S</span><span style="color:rgb(111, 163, 82)">h</span><span style="color:rgb(209, 100, 94)">o</span><span style="color:rgb(238, 171, 106)">w</span>UI**, **UI-TARS**). 
+**Computer Use <span style="color:rgb(106, 158, 210)">O</span><span style="color:rgb(111, 163, 82)">O</span><span style="color:rgb(209, 100, 94)">T</span><span style="color:rgb(238, 171, 106)">B</span>**<img src="./assets/ootb_icon.png" alt="Star" style="height:20px; vertical-align:middle; filter: invert(1) brightness(2);"> is an out-of-the-box (OOTB) solution for Desktop GUI Agent, including API-based (**Claude 3.5 Computer Use**, **OpenRouter**) and locally-running models (**<span style="color:rgb(106, 158, 210)">S</span><span style="color:rgb(111, 163, 82)">h</span><span style="color:rgb(209, 100, 94)">o</span><span style="color:rgb(238, 171, 106)">w</span>UI**, **UI-TARS**). 
 
 **No Docker** is required, and it supports both **Windows** and **macOS**. OOTB provides a user-friendly interface based on Gradio.🎨
 
+### ⚡ **Key Optimizations & Features**
+- 🔄 **Smart Model Routing**: Automatically select optimal models via OpenRouter
+- 💰 **Cost Optimization**: Reduced token costs with intelligent model selection 
+- 🚀 **Enhanced Performance**: Improved inference speed with 4-bit quantization
+- 📊 **Multi-Provider Support**: Seamless switching between OpenAI, Anthropic, Qwen, and OpenRouter
+- 🛠️ **Flexible Architecture**: Unified & modular planner-actor configurations
+
 Visit our study on GUI Agent of Claude 3.5 Computer Use [[project page]](https://computer-use-ootb.github.io). 🌐
 
 ## Update
+- **[2025/01/22]** 🚀 **OpenRouter Integration** & **Performance Optimizations** are now live! Access 100+ AI models through a single API with [**OpenRouter**](https://openrouter.ai) - including GPT-4o, Claude, Qwen-VL, and more. Enjoy **cost-efficient routing**, **automatic failover**, and **competitive pricing** 💰!
 - **[2025/02/08]** We've added the support for [**UI-TARS**](https://github.com/bytedance/UI-TARS). Follow [Cloud Deployment](https://github.com/bytedance/UI-TARS?tab=readme-ov-file#cloud-deployment) or [VLLM deployment](https://github.com/bytedance/UI-TARS?tab=readme-ov-file#local-deployment-vllm) to implement UI-TARS and run it locally in OOTB.
 - **Major Update! [2024/12/04]** **Local Run🔥** is now live! Say hello to [**<span style="color:rgb(106, 158, 210)">S</span><span style="color:rgb(111, 163, 82)">h</span><span style="color:rgb(209, 100, 94)">o</span><span style="color:rgb(238, 171, 106)">w</span>UI**](https://github.com/showlab/ShowUI), an open-source 2B vision-language-action (VLA) model for GUI Agent. Now compatible with `"gpt-4o + ShowUI" (~200x cheaper)`*  & `"Qwen2-VL + ShowUI" (~30x cheaper)`* for only few cents for each task💰! <span style="color: grey; font-size: small;">*compared to Claude Computer Use</span>.
 - **[2024/11/20]** We've added some examples to help you get hands-on experience with Claude 3.5 Computer Use.
@@ -87,7 +95,36 @@ pip install -r requirements.txt
 
 2. Test your UI-TARS sever with the script `.\install_tools\test_ui-tars_server.py`.
 
-### 2.4 (Optional) If you want to deploy Qwen model as planner on ssh server
+### 2.4 (Optional) Get Prepared for **OpenRouter** Integration 🌐
+
+[OpenRouter](https://openrouter.ai) provides unified access to 100+ AI models through a single API, offering cost-efficient routing and competitive pricing.
+
+**Benefits:**
+- 🔄 **Automatic failover** between models
+- 💰 **Cost optimization** with smart routing  
+- 🚀 **100+ models** including GPT-4o, Claude, Gemini, and more
+- 📊 **Transparent pricing** and usage analytics
+
+**Setup:**
+1. Sign up at [OpenRouter](https://openrouter.ai/)
+2. Get your API key from the [Keys page](https://openrouter.ai/keys)
+3. Set your environment variable:
+   ```bash
+   # Windows PowerShell
+   $env:OPENROUTER_API_KEY="sk-or-xxxxx"
+
+   # macOS/Linux
+   export OPENROUTER_API_KEY="sk-or-xxxxx"
+   ```
+
+**Popular Models Available:**
+- `openrouter/auto` - Automatically route to the best available model
+- GPT-4o, GPT-4o-mini
+- Claude 3.5 Sonnet, Claude 3 Haiku
+- Gemini Pro, PaLM 2
+- And many more...
+
+### 2.5 (Optional) If you want to deploy Qwen model as planner on ssh server
 1. git clone this project on your ssh server
 
 2. python computer_use_demo/remote_inference.py
@@ -104,13 +141,14 @@ If you successfully start the interface, you will see two URLs in the terminal:
 ```
 
 
-> <u>For convenience</u>, we recommend running one or more of the following command to set API keys to the environment variables before starting the interface. Then you don’t need to manually pass the keys each run. On Windows Powershell (via the `set` command if on cmd): 
+> <u>For convenience</u>, we recommend running one or more of the following command to set API keys to the environment variables before starting the interface. Then you don't need to manually pass the keys each run. On Windows Powershell (via the `set` command if on cmd): 
 > ```bash
 > $env:ANTHROPIC_API_KEY="sk-xxxxx" (Replace with your own key)
 > $env:QWEN_API_KEY="sk-xxxxx"
 > $env:OPENAI_API_KEY="sk-xxxxx"
+> $env:OPENROUTER_API_KEY="sk-xxxxx" # For OpenRouter integration
 > ```
-> On macOS/Linux, replace `$env:ANTHROPIC_API_KEY` with `export ANTHROPIC_API_KEY` in the above command. 
+> On macOS/Linux, replace `$env:ANTHROPIC_API_KEY` with `export ANTHROPIC_API_KEY` in the above command.
 
 
 ### 4. Control Your Computer with Any Device can Access the Internet
@@ -173,6 +211,7 @@ Now, OOTB supports customizing the GUI Agent via the following models:
         <ul>
           <li><a href="">GPT-4o</a></li>
           <li><a href="">Qwen2-VL-Max</a></li>
+          <li><a href="https://openrouter.ai">OpenRouter (100+ models)</a></li>
           <li><a href="">Qwen2-VL-2B(ssh)</a></li>
           <li><a href="">Qwen2-VL-7B(ssh)</a></li>
           <li><a href="">Qwen2.5-VL-7B(ssh)</a></li>

diff --git a/app.py b/app.py
@@ -61,6 +61,8 @@ def setup_state(state):
         state["anthropic_api_key"] = os.getenv("ANTHROPIC_API_KEY", "")    
     if "qwen_api_key" not in state:
         state["qwen_api_key"] = os.getenv("QWEN_API_KEY", "")
+    if "openrouter_api_key" not in state:
+        state["openrouter_api_key"] = os.getenv("OPENROUTER_API_KEY", "")
     if "ui_tars_url" not in state:
         state["ui_tars_url"] = ""
 
@@ -72,6 +74,8 @@ def setup_state(state):
             state["planner_api_key"] = state["anthropic_api_key"]
         elif state["planner_provider"] == "qwen":
             state["planner_api_key"] = state["qwen_api_key"]
+        elif state["planner_provider"] == "openrouter":
+            state["planner_api_key"] = state["openrouter_api_key"]
         else:
             state["planner_api_key"] = ""
 
@@ -278,7 +282,7 @@ def process_input(user_input, state):
                     label="API Provider",
                     choices=[option.value for option in APIProvider],
                     value="openai",
-                    interactive=False,
+                    interactive=True,
                 )
             with gr.Column():
                 planner_api_key = gr.Textbox(
@@ -393,9 +397,9 @@ def update_planner_model(model_selection, state):
         logger.info(f"Model updated to: {state['planner_model']}")
 
         if model_selection == "qwen2-vl-max":
-            provider_choices = ["qwen"]
+            provider_choices = ["qwen", "openrouter"]
             provider_value = "qwen"
-            provider_interactive = False
+            provider_interactive = True
             api_key_interactive = True
             api_key_placeholder = "qwen API key"
             actor_model_choices = ["ShowUI", "UI-TARS"]
@@ -432,10 +436,10 @@ def update_planner_model(model_selection, state):
                 state["api_key"] = ""
 
         elif model_selection == "gpt-4o" or model_selection == "gpt-4o-mini":
-            # Set provider to "openai", make it unchangeable
-            provider_choices = ["openai"]
+            # Allow OpenAI or OpenRouter as provider
+            provider_choices = ["openai", "openrouter"]
             provider_value = "openai"
-            provider_interactive = False
+            provider_interactive = True
             api_key_interactive = True
             api_key_type = "password"  # Display API key in password form
 
@@ -470,6 +474,8 @@ def update_planner_model(model_selection, state):
             state["api_key"] = state.get("anthropic_api_key", "")
         elif provider_value == "qwen":
             state["api_key"] = state.get("qwen_api_key", "")
+        elif provider_value == "openrouter":
+            state["api_key"] = state.get("openrouter_api_key", "")
         elif provider_value == "local":
             state["api_key"] = ""
         # SSH的情况已经在上面处理过了，这里不需要重复处理
@@ -502,19 +508,44 @@ def update_actor_model(actor_model_selection, state):
         logger.info(f"Actor model updated to: {state['actor_model']}")
 
     def update_api_key_placeholder(provider_value, model_selection):
+        # Persist provider selection into state for use in sampling loop
+        state.value["planner_provider"] = provider_value
+        # Choose placeholder and value based on provider/model
         if model_selection == "claude-3-5-sonnet-20241022":
             if provider_value == "anthropic":
-                return gr.update(placeholder="anthropic API key")
+                placeholder = "anthropic API key"
+                value = state.value.get("anthropic_api_key", "")
             elif provider_value == "bedrock":
-                return gr.update(placeholder="bedrock API key")
+                placeholder = "bedrock API key"
+                value = ""  # credentials via environment
             elif provider_value == "vertex":
-                return gr.update(placeholder="vertex API key")
+                placeholder = "vertex API key"
+                value = ""  # credentials via environment
             else:
-                return gr.update(placeholder="")
-        elif model_selection == "gpt-4o + ShowUI":
-            return gr.update(placeholder="openai API key")
+                placeholder = ""
+                value = ""
         else:
-            return gr.update(placeholder="")
+            if provider_value == "openai":
+                placeholder = "openai API key"
+                value = state.value.get("openai_api_key", "")
+            elif provider_value == "openrouter":
+                placeholder = "openrouter API key"
+                value = state.value.get("openrouter_api_key", "")
+            elif provider_value == "qwen":
+                placeholder = "qwen API key"
+                value = state.value.get("qwen_api_key", "")
+            elif provider_value == "ssh":
+                placeholder = "ssh host and port (e.g. localhost:8000)"
+                value = state.value.get("planner_api_key", "")
+            elif provider_value == "local":
+                placeholder = "not required"
+                value = ""
+            else:
+                placeholder = ""
+                value = ""
+        # Update state mirrored key used by loop
+        state.value["planner_api_key"] = value
+        return gr.update(placeholder=placeholder, value=value, type="password", interactive=True)
 
     def update_system_prompt_suffix(system_prompt_suffix, state):
         state["custom_system_prompt"] = system_prompt_suffix

diff --git a/computer_use_demo/gui_agent/llm_utils/oai.py b/computer_use_demo/gui_agent/llm_utils/oai.py
@@ -1,3 +1,71 @@
+def run_openrouter_interleaved(messages: list, system: str, llm: str, api_key: str, max_tokens=256, temperature=0):
+
+    api_key = api_key or os.environ.get("OPENROUTER_API_KEY")
+    if not api_key:
+        raise ValueError("OPENROUTER_API_KEY is not set")
+
+    headers = {"Content-Type": "application/json",
+               "Authorization": f"Bearer {api_key}"}
+
+    final_messages = [{"role": "system", "content": system}]
+
+    if type(messages) == list:
+        for item in messages:
+            print(f"item: {item}")
+            contents = []
+            if isinstance(item, dict):
+                for cnt in item["content"]:
+                    if isinstance(cnt, str):
+                        if is_image_path(cnt):
+                            base64_image = encode_image(cnt)
+                            content = {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}
+                        else:
+                            content = {"type": "text", "text": cnt}
+
+                    contents.append(content)
+                message = {"role": item["role"], "content": contents}
+
+            elif isinstance(item, str):
+                if is_image_path(item):
+                    base64_image = encode_image(item)
+                    contents.append({"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}})
+                    message = {"role": "user", "content": contents}
+                else:
+                    contents.append({"type": "text", "text": item})
+                    message = {"role": "user", "content": contents}
+
+            else:  # str
+                contents.append({"type": "text", "text": item})
+                message = {"role": "user", "content": contents}
+
+            final_messages.append(message)
+
+
+    elif isinstance(messages, str):
+        final_messages.append({"role": "user", "content": messages})
+
+    print("[openrouter] sending messages:", [f"{k}: {v}, {k}" for k, v in final_messages])
+
+    payload = {
+        "model": llm,
+        "messages": final_messages,
+        "max_tokens": max_tokens,
+        "temperature": temperature,
+    }
+
+    response = requests.post(
+        "https://openrouter.ai/api/v1/chat/completions", headers=headers, json=payload
+    )
+
+    try:
+        text = response.json()['choices'][0]['message']['content']
+        token_usage = int(response.json()['usage']['total_tokens'])
+        return text, token_usage
+
+    except Exception as e:
+        print(f"Error in interleaved openAI: {e}. This may due to your invalid OPENROUTER_API_KEY. Please check the response: {response.json()} ")
+        return response.json()
+
 import os
 import logging
 import base64
@@ -214,17 +282,16 @@ def encode_image(image_path: str, max_size=1024) -> str:
     #     temperature=0)
 
     # print(text, token_usage)
-    text, token_usage = run_ssh_llm_interleaved(
-        messages= [{"content": [
-                        "What is in the screenshot?",   
-                        "tmp/outputs/screenshot_5a26d36c59e84272ab58c1b34493d40d.png"],
-                    "role": "user"
-                    }],
-        llm="Qwen2.5-VL-7B-Instruct",
-        ssh_host="10.245.92.68",
-        ssh_port=9192,
+    text, token_usage = run_openrouter_interleaved(
+        messages=[{"content": [
+            "What is in the screenshot?",
+            "tmp/outputs/screenshot_5a26d36c59e84272ab58c1b34493d40d.png"],
+            "role": "user"
+        }],
+        llm="openrouter/auto",
+        system="You are a helpful assistant",
+        api_key=api_key,
         max_tokens=256,
-        temperature=0.7
-    )
+        temperature=0)
+
     print(text, token_usage)
-    # There is an introduction describing the Calyx... 36986
diff --git a/computer_use_demo/gui_agent/planner/api_vlm_planner.py b/computer_use_demo/gui_agent/planner/api_vlm_planner.py
@@ -11,7 +11,7 @@
 from anthropic.types.beta import BetaMessage, BetaTextBlock, BetaToolUseBlock, BetaMessageParam
 
 from computer_use_demo.tools.screen_capture import get_screenshot
-from computer_use_demo.gui_agent.llm_utils.oai import run_oai_interleaved, run_ssh_llm_interleaved
+from computer_use_demo.gui_agent.llm_utils.oai import run_oai_interleaved, run_ssh_llm_interleaved, run_openrouter_interleaved
 from computer_use_demo.gui_agent.llm_utils.qwen import run_qwen
 from computer_use_demo.gui_agent.llm_utils.llm_utils import extract_data, encode_image
 from computer_use_demo.tools.colorful_text import colorful_text_showui, colorful_text_vlm
@@ -43,9 +43,10 @@ def __init__(
             self.model = "Qwen2-VL-7B-Instruct"
         elif model == "qwen2.5-vl-7b (ssh)":
             self.model = "Qwen2.5-VL-7B-Instruct"
+        elif model == "openrouter/auto":
+            self.model = "openrouter/auto"
         else:
             raise ValueError(f"Model {model} not supported")
-
         self.provider = provider
         self.system_prompt_suffix = system_prompt_suffix
         self.api_key = api_key
@@ -92,7 +93,23 @@ def __call__(self, messages: list):
 
         print(f"Sending messages to VLMPlanner: {planner_messages}")
 
-        if self.model == "gpt-4o-2024-11-20":
+        # If provider is explicitly OpenRouter, route via OpenRouter regardless of model string
+        provider_str = self.provider.value if hasattr(self.provider, "value") else str(self.provider)
+        if provider_str == "openrouter":
+            # Use a generic auto model on OpenRouter unless a specific compatible ID is set elsewhere
+            or_model = "openrouter/auto"
+            vlm_response, token_usage = run_openrouter_interleaved(
+                messages=planner_messages,
+                system=self.system_prompt,
+                llm=or_model,
+                api_key=self.api_key,
+                max_tokens=self.max_tokens,
+                temperature=0,
+            )
+            print(f"openrouter token usage: {token_usage}")
+            self.total_token_usage += token_usage
+            self.total_cost += (token_usage * 0.15 / 1000000)  # Placeholder cost
+        elif self.model == "gpt-4o-2024-11-20":
             vlm_response, token_usage = run_oai_interleaved(
                 messages=planner_messages,
                 system=self.system_prompt,
@@ -117,6 +134,18 @@ def __call__(self, messages: list):
             print(f"qwen token usage: {token_usage}")
             self.total_token_usage += token_usage
             self.total_cost += (token_usage * 0.02 / 7.25 / 1000)  # 1USD=7.25CNY, https://help.aliyun.com/zh/dashscope/developer-reference/tongyi-qianwen-vl-plus-api
+        elif self.model == "openrouter/auto":
+            vlm_response, token_usage = run_openrouter_interleaved(
+                messages=planner_messages,
+                system=self.system_prompt,
+                llm=self.model,
+                api_key=self.api_key,
+                max_tokens=self.max_tokens,
+                temperature=0,
+            )
+            print(f"openrouter token usage: {token_usage}")
+            self.total_token_usage += token_usage
+            self.total_cost += (token_usage * 0.15 / 1000000)  # Placeholder cost
         elif "Qwen" in self.model:
             # 从api_key中解析host和port
             try: