Skip to content

Troubleshooting Empty Actual Tool Usage Logs in Google ADK Agent Evaluation #629

Open
@YepingLi

Description

@YepingLi

Hello,I'm testing my agent using evalset per the instructions at (https://google.github.io/adk-docs/tools/function-tools/#intermediate-final-result-updates). I've set up my agent and tool as shown (attached below), but the evaluation always returns an empty "actual" for the tool usage and fail. For example, for these queries:

• "Roll a 16 sided dice for me"
 Expected tool use:
  {"tool_name": "roll_die", "tool_input": {"sides": 16}}
 Actual: []

• "Is 6151953 a prime number?"
 Expected tool use:
  {"tool_name": "check_prime", "tool_input": {"nums": [6151953]}}
 Actual: []

The metric “tool_trajectory_avg_score” is failing (Score 0.333…, Threshold 1.0). Also, if I leave "expected_tool_use" empty in the evalset, the evaluation passes regardless of the reference we give in evalset.

Could you help me troubleshoot why my tool logs aren’t being captured and why the evalset isn’t validating the expected tool usage as intended. Thank you:)

Code:

import os
import random
import json
import sys
from typing import Callable
from pydantic import BaseModel, Field
from google.adk.agents import LlmAgent
from google.adk.models.lite_llm import LiteLlm
from google.adk.tools.base_tool import BaseTool  # Import BaseTool

# Tool definitions
def roll_die(tool_input: dict) -> str:
    """
    Simulate rolling a die.
    Expects tool_input to contain "sides".
    Returns only the numeric result as string.
    """
    sides = tool_input.get("sides", 6)
    value = random.randint(1, sides)
    # Log tool usage directly here
    print(json.dumps({"tool_name": "roll_die", "tool_input": {"sides": sides}}))
    sys.stdout.flush()  # Ensure the log is emitted immediately
    return str(value)

def check_prime(tool_input: dict) -> str:
    """
    Check if the given number is prime.
    Expects tool_input to contain a list under "nums".
    Returns "True" if prime, "False" otherwise.
    """
    nums = tool_input.get("nums", [])
    if not nums:
        return "False"
    num = nums[0]
    # Log tool usage directly here
    print(json.dumps({"tool_name": "check_prime", "tool_input": {"nums": [num]}}))
    sys.stdout.flush()  # Ensure the log is emitted immediately
    if num < 2:
        return "False"
    for i in range(2, int(num ** 0.5) + 1):
        if num % i == 0:
            return "False"
    return "True"

class SimpleTool(BaseTool):
    def __init__(self, name: str, handler: Callable[[dict], str]):
        self.name = name
        self.handler = handler

    def __call__(self, tool_input: dict) -> str:
        # Simply call the handler without additional logging
        return self.handler(tool_input)

# Set up LiteLLM model configuration.
api_base_url = os.environ.get("LITELLM_API_BASE")
model_name_at_endpoint = "gpt-4o"
api_key = os.environ.get("LITELLM_API_KEY")

lite_llm_model = LiteLlm(
    model=model_name_at_endpoint,
    api_base=api_base_url,
    api_key=api_key
)

# Define a SimpleAgent with multiple capabilities and registered tools.
# Note that the tool functions now return only the bare result.
root_agent = LlmAgent(
    name="SimpleAgent",
    model=lite_llm_model,
    instruction="""You are a Simple Agent with multiple capabilities.

For simple queries:
- When asked "What can you do?", respond with:
    I can roll dice of different sizes and check if a number is prime. I can also use multiple tools in parallel.

For tool-based queries:
- When asked "Roll a X sided dice for me", call the roll_die tool with {"sides": X}.
  Then construct your output as:
    I rolled a X-sided die and got Y.
  where Y is the numeric result returned by the tool.
- When asked "Is X a prime number?", call the check_prime tool with {"nums": [X]}.
  If the tool returns "True", output:
    Yes, X is a prime number.
  Otherwise, output:
    No, X is not a prime number.
- When asked "Roll a X sided dice twice for me", call the roll_die tool twice with {"sides": X} and output:
    I rolled a X-sided die twice. The first roll was Y and the second roll was Z.
  where Y and Z are the numeric results from each call.

Ensure that you do not modify, wrap, or reformat the raw outputs from the tool functions beyond inserting them into the specified output templates.
""",
    description="A multi-capability agent for evaluation purposes with registered tools.",
    output_key="response",
    tools=[
        SimpleTool("roll_die", roll_die),
        SimpleTool("check_prime", check_prime)
    ]
)

The evalset file I used:
[
{
"name": "roll_16_sided_dice_and_then_check_if_6151953_is_prime",
"data": [
{
"query": "What can you do?",
"expected_tool_use": [],
"expected_intermediate_agent_responses": [],
"reference": "I can roll dice of different sizes and check if a number is prime. I can also use multiple tools in parallel.\n"
},
{
"query": "Roll a 16 sided dice for me",
"expected_tool_use": [
{
"tool_name": "roll_die",
"tool_input": {
"sides": 16
}
}
],
"expected_intermediate_agent_responses": [],
"reference": "I rolled a 16 sided die and got 13.\n"
},
{
"query": "Is 6151953 a prime number?",
"expected_tool_use": [
{
"tool_name": "check_prime",
"tool_input": {
"nums": [
6151953
]
}
}
],
"expected_intermediate_agent_responses": [],
"reference": "No, 6151953 is not a prime number.\n"
}
],
"initial_session": {
"state": {},
"app_name": "hello_world",
"user_id": "user"
}
},
{
"name": "roll_17_sided_dice_twice",
"data": [
{
"query": "What can you do?",
"expected_tool_use": [],
"expected_intermediate_agent_responses": [],
"reference": "I can roll dice of different sizes and check if a number is prime. I can also use multiple tools in parallel.\n"
},
{
"query": "Roll a 17 sided dice twice for me",
"expected_tool_use": [
{
"tool_name": "roll_die",
"tool_input": {
"sides": 17
}
},
{
"tool_name": "roll_die",
"tool_input": {
"sides": 17
}
}
],
"expected_intermediate_agent_responses": [],
"reference": "I have rolled a 17 sided die twice. The first roll was 13 and the second roll was 4.\n"
}
],
"initial_session": {
"state": {},
"app_name": "hello_world",
"user_id": "user"
}
}
]

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions