Bounding Box Detection

Hello!

After seeing that sonnet is trained for computer use (with exact pixel coordinates) I tried using it for bounding box detection (both open vocab with text input, or few-shot with image input). However, my results have been worse than I expected given claude's performance with computer use. I tried following the best practices outlined in this repo.

My question to you is:
1) Can you share what specific normalization/origin location is claude for computer use trained for? So I can use the same set up.
2) Any bb grounding related suggestions I should try beyond what is given in the cookbooks.

Thank you very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bounding Box Detection #123

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bounding Box Detection #123

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions