-
Notifications
You must be signed in to change notification settings - Fork 1.2k
chore: Updating how default embedding model is set in stack #3818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
chore: Updating how default embedding model is set in stack #3818
Conversation
0249074
to
6a80f7a
Compare
This looks good to me. I haven't looked at the implementation details closely, but the shape looks correct. |
llama_stack/core/resolver.py
Outdated
args.append(policy) | ||
|
||
# vector_io providers need access to run_config.vector_stores | ||
if provider_spec.api == Api.vector_io and "run_config" in inspect.signature(getattr(module, method)).parameters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm the router should get access to it but not the providers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can completely avoid passing this downstream and leave it limited to the router and then it could be much simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i removed this and am passing only the vector stores config now. LMK if that covers it .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@franciscojavierarceo no, what I mean is -- if we just have the router inspect this value and none of the providers need even a single line of code change -- is that possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so vector_io doesn't always go through router.py, which means we have to add models_api to the providers and have them do the check there.
if we are okay with vector_io going through the router, then we can add the default check there and pass that in the model_extra.
lmk which you'd prefer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might need some more consideration, but at first glance, the main downside is that changing the value requires a server restart, while the main advantage is that it’s GitOps-friendly. (compared to default_configured
being in models)
@leseb I think that's a feature in my mind. This is a very consequential operation. If you are doing it willy nilly without even a rollover of your servers, you are likely not considering downstream effects. Also, note that you can later build APIs on top of this (just like a proposed |
I personally agree with @ashwinb here, which is why I didn't want to leave the previous approach for long.
That's the key piece. We make the UX much better for OpenAI compatibility but changing the default feels worthy of a new deploy (making sure our actual users test and verify through their CI). |
server: | ||
port: 8321 | ||
vector_stores: | ||
default_embedding_model_id: sentence-transformers/nomic-ai/nomic-embed-text-v1.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the models
section is empty in this particular run.yaml, does it make sense to set default_embedding_model_id here? I think it would be better to set it in llama-stack/llama_stack/distributions/meta-reference-gpu/run.yaml
Also, given this model definition:
models:
- metadata:
embedding_dimension: 768
model_id: nomic-embed-text-v1.5
provider_id: sentence-transformers
model_type: embedding
Shouldn't this be the correct value to set?
vector_stores:
default_embedding_model_id: nomic-embed-text-v1.5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no we updated to require the provider name explicitly in #3822 and nomic
is a prefix for HF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the value that should be added is provider_id/provider_model_id?
If yes, as model_id
doesn't contain this info, consider calling the new configuration parameter differently (e.g. default_embedding_model
). Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah good point. this is basically the identifier
we use as the key in the registry. it is all kinds of confusing. @franciscojavierarceo maybe this key should be (provider_id, model_id)
tuple instead to avoid all confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm i'll update.
self.files_api = files_api | ||
self.kvstore = kvstore | ||
self.models_api = models_api | ||
# These will be set by implementing classes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not add to init?
Signed-off-by: Francisco Javier Arceo <[email protected]> # Conflicts: # .github/workflows/integration-vector-io-tests.yml # llama_stack/distributions/ci-tests/run.yaml # llama_stack/distributions/starter-gpu/run.yaml # llama_stack/distributions/starter/run.yaml # llama_stack/distributions/template.py # llama_stack/providers/utils/memory/openai_vector_store_mixin.py
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]> # Conflicts: # llama_stack/providers/utils/memory/openai_vector_store_mixin.py
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
faedfae
to
6eac571
Compare
It's all good! Maybe I wasn't clear but I prefer the GitOps approach as I have already commented on the |
85b9464
to
39a7ea9
Compare
Signed-off-by: Francisco Javier Arceo <[email protected]>
39a7ea9
to
48ab1a2
Compare
Signed-off-by: Francisco Javier Arceo <[email protected]>
What does this PR do?
Refactor setting default embedding to use an optional
vector_stores
config in theStackRunConfig
and clean up code to do so. Also update starter template and vector-io action.New config is simply:
Test Plan