Skip to content

test: Mock Claude models to improve LLM test reliability #1968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

shuoweil
Copy link
Contributor

@shuoweil shuoweil commented Aug 6, 2025

Mock Claude models to improve LLM test reliability
b/436340035

@shuoweil shuoweil requested review from tswast and GarrettWu August 6, 2025 00:01
@shuoweil shuoweil self-assigned this Aug 6, 2025
@shuoweil shuoweil requested review from a team as code owners August 6, 2025 00:01
@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Aug 6, 2025
@GarrettWu
Copy link
Contributor

what was the root cause of the failure?

@shuoweil
Copy link
Contributor Author

shuoweil commented Aug 6, 2025

what was the root cause of the failure?

Hi @GarrettWu , the Claude model endpoint is only available in the us-east5 region, not in us-central1.
https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/claude/sonnet-3-5
https://screenshot.googleplex.com/34kotytZWcQhgmd

@GarrettWu
Copy link
Contributor

what was the root cause of the failure?

Hi @GarrettWu , the Claude model endpoint is only available in the us-east5 region, not in us-central1. https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/claude/sonnet-3-5 https://screenshot.googleplex.com/34kotytZWcQhgmd

Does it mean BQML US region can't use the model? Could you check with jasperxu@? If so, we'd deprecate the model and remove the tests. Mocking it doesn't really testing anything.

@shuoweil
Copy link
Contributor Author

shuoweil commented Aug 7, 2025

us-east5 region,

You are right, mock does not anything. I can use a session in us-east5 region, it also limits our testing. I have talked to Jasper before I work on this PR, we can use claude in us-east5. Let's chat offline tomorrow.

@shuoweil shuoweil marked this pull request as draft August 7, 2025 22:55
@shuoweil shuoweil added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Aug 7, 2025
@shuoweil shuoweil removed request for tswast and GarrettWu August 7, 2025 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. do not merge Indicates a pull request not ready for merge, due to either quality or timing. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants