Skip to content

Conversation

franciscojavierarceo
Copy link
Collaborator

@franciscojavierarceo franciscojavierarceo commented Oct 15, 2025

What does this PR do?

Refactor setting default embedding to use an optional vector_stores config in the StackRunConfig and clean up code to do so. Also update starter template and vector-io action.

New config is simply:

vector_stores:
  default_embedding_model_id: sentence-transformers/nomic-ai/nomic-embed-text-v1.5

Test Plan

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 15, 2025
@franciscojavierarceo franciscojavierarceo marked this pull request as ready for review October 16, 2025 14:05
@ashwinb
Copy link
Contributor

ashwinb commented Oct 16, 2025

This looks good to me. I haven't looked at the implementation details closely, but the shape looks correct.

args.append(policy)

# vector_io providers need access to run_config.vector_stores
if provider_spec.api == Api.vector_io and "run_config" in inspect.signature(getattr(module, method)).parameters:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm the router should get access to it but not the providers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can completely avoid passing this downstream and leave it limited to the router and then it could be much simpler.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i removed this and am passing only the vector stores config now. LMK if that covers it .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@franciscojavierarceo no, what I mean is -- if we just have the router inspect this value and none of the providers need even a single line of code change -- is that possible?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so vector_io doesn't always go through router.py, which means we have to add models_api to the providers and have them do the check there.

if we are okay with vector_io going through the router, then we can add the default check there and pass that in the model_extra.

lmk which you'd prefer

Copy link
Collaborator

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might need some more consideration, but at first glance, the main downside is that changing the value requires a server restart, while the main advantage is that it’s GitOps-friendly. (compared to default_configured being in models)

@ashwinb
Copy link
Contributor

ashwinb commented Oct 16, 2025

This might need some more consideration, but at first glance, the main downside is that changing the value requires a server restart, while the main advantage is that it’s GitOps-friendly. (compared to default_configured being in models)

@leseb I think that's a feature in my mind. This is a very consequential operation. If you are doing it willy nilly without even a rollover of your servers, you are likely not considering downstream effects.

Also, note that you can later build APIs on top of this (just like a proposed POST /providers) which are admin-scoped and then don't require a server restart, if you so wished. But doing that as part of the model configuration is a no-no in my opinion.

@franciscojavierarceo
Copy link
Collaborator Author

franciscojavierarceo commented Oct 16, 2025

I personally agree with @ashwinb here, which is why I didn't want to leave the previous approach for long.

This is a very consequential operation. If you are doing it willy nilly without even a rollover of your servers, you are likely not considering downstream effects.

That's the key piece. We make the UX much better for OpenAI compatibility but changing the default feels worthy of a new deploy (making sure our actual users test and verify through their CI).

server:
port: 8321
vector_stores:
default_embedding_model_id: sentence-transformers/nomic-ai/nomic-embed-text-v1.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the models section is empty in this particular run.yaml, does it make sense to set default_embedding_model_id here? I think it would be better to set it in llama-stack/llama_stack/distributions/meta-reference-gpu/run.yaml

Also, given this model definition:

models:
- metadata:
    embedding_dimension: 768
  model_id: nomic-embed-text-v1.5
  provider_id: sentence-transformers
  model_type: embedding

Shouldn't this be the correct value to set?

vector_stores:
  default_embedding_model_id: nomic-embed-text-v1.5

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no we updated to require the provider name explicitly in #3822 and nomic is a prefix for HF.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the value that should be added is provider_id/provider_model_id?

If yes, as model_id doesn't contain this info, consider calling the new configuration parameter differently (e.g. default_embedding_model). Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah good point. this is basically the identifier we use as the key in the registry. it is all kinds of confusing. @franciscojavierarceo maybe this key should be (provider_id, model_id) tuple instead to avoid all confusion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm i'll update.

self.files_api = files_api
self.kvstore = kvstore
self.models_api = models_api
# These will be set by implementing classes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not add to init?

Signed-off-by: Francisco Javier Arceo <[email protected]>

# Conflicts:
#	.github/workflows/integration-vector-io-tests.yml
#	llama_stack/distributions/ci-tests/run.yaml
#	llama_stack/distributions/starter-gpu/run.yaml
#	llama_stack/distributions/starter/run.yaml
#	llama_stack/distributions/template.py
#	llama_stack/providers/utils/memory/openai_vector_store_mixin.py
Signed-off-by: Francisco Javier Arceo <[email protected]>

# Conflicts:
#	llama_stack/providers/utils/memory/openai_vector_store_mixin.py
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
@leseb
Copy link
Collaborator

leseb commented Oct 17, 2025

This might need some more consideration, but at first glance, the main downside is that changing the value requires a server restart, while the main advantage is that it’s GitOps-friendly. (compared to default_configured being in models)

@leseb I think that's a feature in my mind. This is a very consequential operation. If you are doing it willy nilly without even a rollover of your servers, you are likely not considering downstream effects.

Also, note that you can later build APIs on top of this (just like a proposed POST /providers) which are admin-scoped and then don't require a server restart, if you so wished. But doing that as part of the model configuration is a no-no in my opinion.

It's all good! Maybe I wasn't clear but I prefer the GitOps approach as I have already commented on the POST /providers proposal. No blockers from me. Thanks!

Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants