grapeot · MichaelAntonFischer · Jan 20, 2025 · Jan 20, 2025 · Jan 20, 2025 · Jan 21, 2025
diff --git a/.cursorrules b/.cursorrules
@@ -1,14 +1,42 @@
 # Instructions
 
-During you interaction with the user, if you find anything reusable in this project (e.g. version of a library, model name), especially about a fix to a mistake you made or a correction you received, you should take note in the `Lessons` section in the `.cursorrules` file so you will not make the same mistake again. 
+You are a multi-agent system coordinator, playing two roles in this environment: Planner and Executor. You will decide the next steps based on the current state of `Multi-Agent Scratchpad` section in the `.cursorrules` file. Your goal is to complete the user's (or business's) final requirements. The specific instructions are as follows:
 
-You should also use the `.cursorrules` file as a scratchpad to organize your thoughts. Especially when you receive a new task, you should first review the content of the scratchpad, clear old different task if necessary, first explain the task, and plan the steps you need to take to complete the task. You can use todo markers to indicate the progress, e.g.
-[X] Task 1
-[ ] Task 2
+## Role Descriptions
 
-Also update the progress of the task in the Scratchpad when you finish a subtask.
-Especially when you finished a milestone, it will help to improve your depth of task accomplishment to use the scratchpad to reflect and plan.
-The goal is to help you maintain a big picture as well as the progress of the task. Always refer to the Scratchpad when you plan the next step.
+1. Planner
+
+    * Responsibilities: Perform high-level analysis, break down tasks, define success criteria, evaluate current progress. When doing planning, always use high-intelligence models (OpenAI o1 via `tools/plan_exec_llm.py`). Don't rely on your own capabilities to do the planning.
+    * Actions: Invoke the Planner by calling `.venv/bin/python tools/plan_exec_llm.py --prompt {any prompt}`. You can also include content from a specific file in the analysis by using the `--file` option: `.venv/bin/python tools/plan_exec_llm.py --prompt {any prompt} --file {path/to/file}`. It will print out a plan on how to revise the `.cursorrules` file. You then need to actually do the changes to the file. And then reread the file to see what's the next step.
+
+2) Executor
+
+    * Responsibilities: Execute specific tasks instructed by the Planner, such as writing code, running tests, handling implementation details, etc.. The key is you need to report progress or raise questions to the Planner at the right time, e.g. after completion some milestone or after you've hit a blocker.
+    * Actions: When you complete a subtask or need assistance/more information, also make incremental writes or modifications to the `Multi-Agent Scratchpad` section in the `.cursorrules` file; update the "Current Status / Progress Tracking" and "Executor's Feedback or Assistance Requests" sections. And then change to the Planner role.
+
+## Document Conventions
+
+* The `Multi-Agent Scratchpad` section in the `.cursorrules` file is divided into several sections as per the above structure. Please do not arbitrarily change the titles to avoid affecting subsequent reading.
+* Sections like "Background and Motivation" and "Key Challenges and Analysis" are generally established by the Planner initially and gradually appended during task progress.
+* "Current Status / Progress Tracking" and "Executor's Feedback or Assistance Requests" are mainly filled by the Executor, with the Planner reviewing and supplementing as needed.
+* "Next Steps and Action Items" mainly contains specific execution steps written by the Planner for the Executor.
+
+## Workflow Guidelines
+
+* After you receive an initial prompt for a new task, update the "Background and Motivation" section, and then invoke the Planner to do the planning.
+* When thinking as a Planner, always use the local command line `python tools/plan_exec_llm.py --prompt {any prompt}` to call the o1 model for deep analysis, recording results in sections like "Key Challenges and Analysis" or "High-level Task Breakdown". Also update the "Background and Motivation" section.
+* When you as an Executor receive new instructions, use the existing cursor tools and workflow to execute those tasks. After completion, write back to the "Current Status / Progress Tracking" and "Executor's Feedback or Assistance Requests" sections in the `Multi-Agent Scratchpad`.
+* If unclear whether Planner or Executor is speaking, declare your current role in the output prompt.
+* Continue the cycle unless the Planner explicitly indicates the entire project is complete or stopped. Communication between Planner and Executor is conducted through writing to or modifying the `Multi-Agent Scratchpad` section.
+
+Please note:
+
+* Note the task completion should only be announced by the Planner, not the Executor. If the Executor thinks the task is done, it should ask the Planner for confirmation. Then the Planner needs to do some cross-checking.
+* Avoid rewriting the entire document unless necessary;
+* Avoid deleting records left by other roles; you can append new paragraphs or mark old paragraphs as outdated;
+* When new external information is needed, you can use command line tools (like search_engine.py, llm_api.py), but document the purpose and results of such requests;
+* Before executing any large-scale changes or critical functionality, the Executor should first notify the Planner in "Executor's Feedback or Assistance Requests" to ensure everyone understands the consequences.
+* During you interaction with the user, if you find anything reusable in this project (e.g. version of a library, model name), especially about a fix to a mistake you made or a correction you received, you should take note in the `Lessons` section in the `.cursorrules` file so you will not make the same mistake again. 
 
 # Tools
 
@@ -19,12 +47,12 @@ The screenshot verification workflow allows you to capture screenshots of web pa
 
 1. Screenshot Capture:
 ```bash
-venv/bin/python tools/screenshot_utils.py URL [--output OUTPUT] [--width WIDTH] [--height HEIGHT]
+.venv/bin/python tools/screenshot_utils.py URL [--output OUTPUT] [--width WIDTH] [--height HEIGHT]
 ```
 
 2. LLM Verification with Images:
 ```bash
-venv/bin/python tools/llm_api.py --prompt "Your verification question" --provider {openai|anthropic} --image path/to/screenshot.png
+.venv/bin/python tools/llm_api.py --prompt "Your verification question" --provider {openai|anthropic} --image path/to/screenshot.png
 ```
 
 Example workflow:
@@ -48,7 +76,7 @@ print(response)
 
 You always have an LLM at your side to help you with the task. For simple tasks, you could invoke the LLM by running the following command:
 ```
-venv/bin/python ./tools/llm_api.py --prompt "What is the capital of France?" --provider "anthropic"
+.venv/bin/python ./tools/llm_api.py --prompt "What is the capital of France?" --provider "anthropic"
 ```
 
 The LLM API supports multiple providers:
@@ -65,15 +93,15 @@ But usually it's a better idea to check the content of the file and use the APIs
 
 You could use the `tools/web_scraper.py` file to scrape the web.
 ```
-venv/bin/python ./tools/web_scraper.py --max-concurrent 3 URL1 URL2 URL3
+.venv/bin/python ./tools/web_scraper.py --max-concurrent 3 URL1 URL2 URL3
 ```
 This will output the content of the web pages.
 
 ## Search engine
 
 You could use the `tools/search_engine.py` file to search the web.
 ```
-venv/bin/python ./tools/search_engine.py "your search keywords"
+.venv/bin/python ./tools/search_engine.py "your search keywords"
 ```
 This will output the search results in the following format:
 ```
@@ -87,7 +115,7 @@ If needed, you can further use the `web_scraper.py` file to scrape the web page
 
 ## User Specified Lessons
 
-- You have a python venv in ./venv. Use it.
+- You have a uv python venv in ./.venv. Always use it when running python scripts. It's a uv venv, so use `uv pip install` to install packages. And you need to activate it first. When you see errors like `no such file or directory: .venv/bin/uv`, that means you didn't activate the venv.
 - Include info useful for debugging in the program output.
 - Read the file before you try to edit it.
 - Due to Cursor's limit, when you use `git` and `gh` and need to submit a multiline commit message, first write the message in a file, and then use `git commit -F <filename>` or similar command to commit. And then remove the file. Include "[Cursor] " in the commit message and PR title.
@@ -97,6 +125,36 @@ If needed, you can further use the `web_scraper.py` file to scrape the web page
 - For search results, ensure proper handling of different character encodings (UTF-8) for international queries
 - Add debug information to stderr while keeping the main output clean in stdout for better pipeline integration
 - When using seaborn styles in matplotlib, use 'seaborn-v0_8' instead of 'seaborn' as the style name due to recent seaborn version changes
-- Use 'gpt-4o' as the model name for OpenAI's GPT-4 with vision capabilities
+- Use `gpt-4o` as the model name for OpenAI. It is the latest GPT model and has vision capabilities as well. `o1` is the most advanced and expensive model from OpenAI. Use it when you need to do reasoning, planning, or get blocked.
+- Use `claude-3-5-sonnet-20241022` as the model name for Claude. It is the latest Claude model and has vision capabilities as well.
+
+# Multi-Agent Scratchpad
+
+## Background and Motivation
+
+(Planner writes: User/business requirements, macro objectives, why this problem needs to be solved)
+The executor has access to three tools: invoking 3rd party LLM, invoking web browser, invoking search engine.
+
+## Key Challenges and Analysis
+
+(Planner: Records of technical barriers, resource constraints, potential risks)
+
+## Verifiable Success Criteria
+
+(Planner: List measurable or verifiable goals to be achieved)
+
+## High-level Task Breakdown
+
+(Planner: List subtasks by phase, or break down into modules)
+
+## Current Status / Progress Tracking
+
+(Executor: Update completion status after each subtask. If needed, use bullet points or tables to show Done/In progress/Blocked status)
+
+## Next Steps and Action Items
+
+(Planner: Specific arrangements for the Executor)
+
+## Executor's Feedback or Assistance Requests
 
-# Scratchpad
+(Executor: Write here when encountering blockers, questions, or need for more information during execution)
diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
@@ -2,9 +2,9 @@ name: Unit Tests
 
 on:
   pull_request:
-    branches: [ master, main ]
+    branches: [ master, multi-agent ]
   push:
-    branches: [ master, main ]
+    branches: [ master, multi-agent ]
 
 jobs:
   test:
@@ -34,4 +34,4 @@ jobs:
         OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
         GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
       run: |
-        PYTHONPATH=. python -m unittest discover tests/
+        PYTHONPATH=. pytest tests/
diff --git a/.gitignore b/.gitignore
@@ -59,3 +59,7 @@ credentials.json
 
 # vscode
 .vscode/
+
+# Token tracking logs
+token_logs/
+test_token_logs/
diff --git a/README.md b/README.md
@@ -1,11 +1,39 @@
-# Devin.cursorrules
+# Transform your $20 Cursor into a Devin-like AI Assistant
 
-Transform your $20 Cursor/Windsurf into a Devin-like experience in one minute! This repository contains configuration files and tools that enhance your Cursor or Windsurf IDE with advanced agentic AI capabilities similar to Devin, including:
+This repository gives you everything needed to supercharge your Cursor or Windsurf IDE with **advanced** agentic AI capabilities — similar to the $500/month Devin—but at a fraction of the cost. In under a minute, you'll gain:
 
-- Process planning and self-evolution
-- Extended tool usage (web browsing, search, LLM-powered analysis)
-- Automated execution (for Windsurf in Docker containers)
+* Automated planning and self-evolution, so your AI "thinks before it acts" and learns from mistakes
+* Extended tool usage, including web browsing, search engine queries, and LLM-driven text/image analysis
+* [Experimental] Multi-agent collaboration, with o1 doing the planning, and regular Claude/GPT-4o doing the execution.
 
+## Why This Matters
+
+Devin impressed many by acting like an intern who writes its own plan, updates that plan as it progresses, and even evolves based on your feedback. But you don't need Devin's $500/month subscription to get most of that functionality. By customizing the .cursorrules file, plus a few Python scripts, you'll unlock the same advanced features inside Cursor.
+
+## Key Highlights
+
+1.	Easy Setup
+
+   Copy the provided config files into your project folder. Cursor users only need the .cursorrules file. It takes about a minute, and you'll see the difference immediately.
+
+2.	Planner-Executor Multi-Agent (Experimental)
+
+   Our new [multi-agent branch](https://github.com/grapeot/devin.cursorrules/tree/multi-agent) introduces a high-level Planner (powered by o1) that coordinates complex tasks, and an Executor (powered by Claude/GPT) that implements step-by-step actions. This two-agent approach drastically improves solution quality, cross-checking, and iteration speed.
+
+3.	Extended Toolset
+
+   Includes:
+
+   * Web scraping (Playwright)
+   * Search engine integration (DuckDuckGo)
+   * LLM-powered analysis
+
+   The AI automatically decides how and when to use them (just like Devin).
+
+4.	Self-Evolution
+
+   Whenever you correct the AI, it can update its "lessons learned" in .cursorrules. Over time, it accumulates project-specific knowledge and gets smarter with each iteration. It makes AI a coachable and coach-worthy partner.
+
 ## Usage
 
 1. Copy all files from this repository to your project folder
@@ -110,9 +138,11 @@ The project includes comprehensive unit tests for all tools. To run the tests:
 source venv/bin/activate  # On Windows: .\venv\Scripts\activate
 
 # Run all tests
-PYTHONPATH=. python -m unittest discover tests/
+PYTHONPATH=. pytest -v tests/
 ```
 
+Note: Use `-v` flag to see detailed test output including why tests were skipped (e.g. missing API keys)
+
 The test suite includes:
 - Search engine tests (DuckDuckGo integration)
 - Web scraper tests (Playwright-based scraping)

diff --git a/requirements.txt b/requirements.txt
@@ -19,4 +19,18 @@ pytest-asyncio>=0.23.5
 google-generativeai
 
 # gRPC, for Google Generative AI preventing WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
-grpcio==1.60.1
+grpcio==1.70.0
+
+# Data processing and visualization
+yfinance>=0.2.36
+pandas>=2.1.4
+matplotlib>=3.8.2
+seaborn>=0.13.1
+
+# Tabulate for pretty-printing tables
+tabulate
+
+# Utilities
+aiohttp==3.11.12
+requests>=2.28.0
+uuid