How to Trace and Evaluate Multi-Turn Chatbot Conversations Correctly in LangSmith? #818

farouk09 · 2025-05-30T11:21:45Z

farouk09
May 30, 2025

Hi everyone,

I'm working on evaluating a multi-turn chatbot using LangSmith, but I'm running into an issue with how traces and datasets are generated from conversations.

Here’s what I expect for the second interaction of a chat like this:

1: User → Hello  
2: AI   → Hi there!  
3: User → What's the weather today?  
4: AI   → It's sunny in Paris today.

Expected Tracing for Turn 4:

Input: turns 1 → 2 → 3
Output: turn 4

Actual Tracing in LangSmith:

Input: only turn 3
Output: turns 1 → 2 → 3 → 4

This makes it hard to evaluate responses turn-by-turn or generate a proper dataset from the trace, since it uses only the latest user input as input, and puts the whole chat history in the output.

What I Want:

I’d like to be able to trace and export datasets where:

Input = previous conversation history (up to the latest user message)
Output = only the latest AI response

Has anyone encountered this or found a clean way to:

Adjust the trace construction to reflect this?
Generate datasets using the “Input = full context, Output = final AI reply” format?

Any tips, code snippets, or config examples would be really appreciated!

Thanks 🙏

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to Trace and Evaluate Multi-Turn Chatbot Conversations Correctly in LangSmith? #818

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

How to Trace and Evaluate Multi-Turn Chatbot Conversations Correctly in LangSmith? #818

Uh oh!

farouk09 May 30, 2025

Replies: 0 comments

farouk09
May 30, 2025