Skip to content

Bounding Box Detection #123

Open
Open
@batu

Description

@batu

Hello!

After seeing that sonnet is trained for computer use (with exact pixel coordinates) I tried using it for bounding box detection (both open vocab with text input, or few-shot with image input). However, my results have been worse than I expected given claude's performance with computer use. I tried following the best practices outlined in this repo.

My question to you is:

  1. Can you share what specific normalization/origin location is claude for computer use trained for? So I can use the same set up.
  2. Any bb grounding related suggestions I should try beyond what is given in the cookbooks.

Thank you very much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions