Skip to content

Feature: Team Agent, Adv. Terminal, MCP Tools, GPU/CPU Instruments & CUDA Docker #384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 39 commits into
base: main
Choose a base branch
from

Conversation

deciduus
Copy link

Summary
This PR introduces several major enhancements and fixes across the codebase, focusing on improved agent tooling, Docker support, and advanced media generation capabilities.


🚀 Added

  • Standalone CUDA Dockerfile:
    • Enables GPU-accelerated workflows for supported tasks.
  • Team Agent Tool:
    • New tool for collaborative agent operations.
  • Image Generation & Music Generation:
    • Both tools now support CPU-based (Debian slim) and GPU (CUDA) images.
    • Each uses a dedicated .sh script to create and manage an instruments_venv for dependencies.
    • Includes heartbeat mechanisms to prevent unnecessary terminal passback and API waste.

🛠️ Edited

  • Initialize.sh:
    • Added -f (force) flag to resolve permissions issues.
  • Team Agent Prompts:
    • Enhanced prompts for improved Team Agent support.
  • Code Execution & Input Tools:
    • Improved terminal management for complex coding tasks, including support for individual terminal reset.
    • Code execution tool now uses a rolling timeout window:
      • If idle for 10 seconds, execution is terminated and passed back.
      • No maximum execution time if the process remains responsive.

📝 Notes

  • These changes improve reliability, resource management, and developer experience for both agent and media generation workflows.
  • Please review the new Dockerfiles and scripts for compatibility with your environment.

deciduus added 16 commits May 8, 2025 15:44
Edit: Initialize.sh to fix perms issue by adding -f force.
Edit:
Prompts for Team Agent support
Code Exe tool, input tool, and instructions to support individual terminal reset and better terminal management for complex coding tasks.
-both support for CPU based (debian slim) or gpu (CUDA based) images.
-both utilize .sh script that creates and maintains a deticated instruments_venv for dependencies and include heartbeats that prevent terminal passback (api waste).
-code_exe tool timeouts have been changes to a rolling window, if idle for 10 seconds it passes back, else there is no max exe time if a process is responsive.
…nhanced to detect common shell prompt patterns (like (venv) ...$, root@...:~#, etc.) using regex. When such a prompt is detected in the output, the function will immediately break and return the output, rather than waiting for the full timeout.
…ng agent loop at the agent.py level. The call to self.call_extensions("message_loop_prompts", ...) in prepare_prompt is now wrapped in a try/except block.
…ases where the parsed string is just '+', '-', or empty, returning it as a string instead of raising a ValueError.
…ggle, team leader planning phase, and task document handling, team leader integrate results task document handling

*allows for persistent development between teams by leveraging team leader as an investigative task manager before and after team agent task execution cycles*
…tiple seperate f.write calls, complex string manipulation. encouraged reading, implement fix, and check syntax. context a little bulky but working for now.
…t if the last 128 characters repeat 5 times consecutively and will stop execution if this pattern occurs, indicating a likely infinite loop.

Now, when a repeating output pattern is detected:
An error message will be printed to the console, indicating the loop and that the session is being reset.
The log will be updated.
self.reset_terminal(session=session) will be called to reset the specific session where the loop occurred.
A detailed message will be returned to the agent, including the captured output, the loop detection confirmation, a notification that the session was reset, and advice to review the problematic code/command.
@deciduus
Copy link
Author

deciduus commented May 13, 2025

Added:

  • Task edit/update/delete methods, auto dependency toggling, team leader planning phase, and enhanced task document handling for persistent development between teams.

Edit:

  • Enhanced get_terminal_output in code_execution_tool.py to detect common shell prompt patterns via regex and break early.
  • Updated Code Execution Markdown to discourage multi-line f.write() calls to prevent UI errors.
  • Wrapped self.call_extensions("message_loop_prompts", ...) in agent.py in a try/except block to prevent extension errors (e.g., FAISS/KeyError) from crashing the agent loop.
  • Revised problem-solving/bug fixing guidance to discourage common indentation-error prone methods.
  • Updated Code Execution Markdown prompt to discourage line-by-line edits, multiple separate f.write calls, and discouraging complex string manipulation, while encouraging reading, implementing fixes, and checking syntax.
  • Implemented detection of repeating output patterns (last 128 chars repeated 5 times) in get_terminal_output of code_execution_tool.py to identify likely infinite loops, reset the specific session, and return a detailed message to the agent.
  • Updated Team Agent System Prompts.
  • Truncated looped information in loop detection to reduce redundant context for the LLM.
  • Encouraged cat > EOF multiline edits for code and documentation.

Fix:

  • _parse_number method in dirty_jason.py now gracefully handles lone '+', '-', or empty strings by returning them as strings, avoiding ValueError.
  • Addressed input tool session management issues.

Rafael-U and others added 12 commits May 13, 2025 13:38
… Servers

feat: (draft) support MCP Servers

feat: install npx for local MCP Servers execution

feat: add nest-asyncio as direct dependency

feat: add pdf2image to requirements.txt

feat: add local nginx for playwright file access

feat: MCP Server Support (Part 1: local stdio servers)
…nfusion between 'mcp' package and 'mcp.py'. Edited dockerfile to 'latest' so it doesnt spin up old builds after image creation. Also edited preinstall to ensure the mcp related dependencies can load non-interactive.
…uto mcp install if present in settings config (on compose)
…nt to not get distracted, to continue process.
…ly cast to a string before any length checks or slicing operations are performed on it. I also renamed the variable from user_message_context to user_message_text for better clarity, as it primarily holds the textual part of the user's message.

This should provide the agent with a more robust and helpful contextual block after each MCP tool execution.
@deciduus
Copy link
Author

Feature: MCP Tool Integration, Auto-Setup, and Enhancements (Post-#384)

This PR builds upon the work in #384, primarily integrating Rafa's MCP tool implementation (from @Rafael-U #332, with commit history preserved via cherry-picking) and introducing several enhancements to make MCP server usage more robust and user-friendly.

Key Updates in this PR:

  • MCP Integration & Enhancements (based on feat: Implement support for MCP Servers (Claude Tools) - Stdio and SSE via json config #332):

    • Successfully integrated the core MCP tool functionality.
    • Added comprehensive setup documentation (docs/mcp_setup.md) to guide users through configuring MCP servers via the UI.
    • Implemented automatic global installation of MCP server packages (for npx based servers) on system restart, streamlining the setup process.
    • Enhanced MCP tool interactions by providing contextual reminders in the agent's history. This includes details of the original tool call, the related user prompt, and suggested next steps to help agents maintain context, especially with potentially long or complex MCP tool outputs.
    • Renamed mcp.py to python/helpers/mcp_handler.py to avoid potential namespace conflicts with the mcp library.
  • System & Configuration Improvements:

    • Included configuration optimizations specifically for Debian Slim and CUDA Docker images to ensure successful MCP dependency handling and installation.
    • Upgraded the system to Node.js 20.x to support modern npm/npx requirements, crucial for MCP server package management (this impacts pre-installation steps for Debian and handling within CUDA environments).

Current Status:

Initial reviews indicate that various MCPs are functioning as expected. There might still be some minor quirks to address. The code_execution.md prompt has not yet been slimmed down but will be attended to in a subsequent update.

This PR aims to significantly improve the MCP tooling experience within Agent Zero.

@deciduus deciduus changed the title Team Agent, Advanced Terminal Handling, GPU/CPU-Compatible Instruments, and Standalone CUDA Dockerfile Feature: Team Agent, Adv. Terminal, MCP Tools, GPU/CPU Instruments & CUDA Docker May 14, 2025
@deciduus
Copy link
Author

deciduus commented May 14, 2025

Edits to clean up the PR based on Jan's feedback:
-slimmed docker builds removing unneccissary packages from preinstall
-cleaned up redunant mcp config in agent.py
-cleaned up the mcp auto install and checks logs used for debugged so they're now concise and informative

@deciduus
Copy link
Author

Added new instrument for retrieving youtube video transcripts easily by providing link.

Resolved merge conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants