chore: Updating how default embedding model is set in stack #3818

franciscojavierarceo · 2025-10-15T21:16:06Z

What does this PR do?

Refactor setting default embedding to use an optional vector_stores config in the StackRunConfig and clean up code to do so. Also update starter template and vector-io action.

New config is simply:

vector_stores:
  default_embedding_model_id: sentence-transformers/nomic-ai/nomic-embed-text-v1.5

Test Plan

ashwinb · 2025-10-16T14:18:48Z

This looks good to me. I haven't looked at the implementation details closely, but the shape looks correct.

ashwinb · 2025-10-16T14:20:41Z

llama_stack/core/resolver.py

            args.append(policy)

+    # vector_io providers need access to run_config.vector_stores
+    if provider_spec.api == Api.vector_io and "run_config" in inspect.signature(getattr(module, method)).parameters:


hmm the router should get access to it but not the providers?

I think we can completely avoid passing this downstream and leave it limited to the router and then it could be much simpler.

yeah i removed this and am passing only the vector stores config now. LMK if that covers it .

@franciscojavierarceo no, what I mean is -- if we just have the router inspect this value and none of the providers need even a single line of code change -- is that possible?

so vector_io doesn't always go through router.py, which means we have to add models_api to the providers and have them do the check there.

if we are okay with vector_io going through the router, then we can add the default check there and pass that in the model_extra.

lmk which you'd prefer

leseb

This might need some more consideration, but at first glance, the main downside is that changing the value requires a server restart, while the main advantage is that it’s GitOps-friendly. (compared to default_configured being in models)

ashwinb · 2025-10-16T15:00:11Z

This might need some more consideration, but at first glance, the main downside is that changing the value requires a server restart, while the main advantage is that it’s GitOps-friendly. (compared to default_configured being in models)

@leseb I think that's a feature in my mind. This is a very consequential operation. If you are doing it willy nilly without even a rollover of your servers, you are likely not considering downstream effects.

Also, note that you can later build APIs on top of this (just like a proposed POST /providers) which are admin-scoped and then don't require a server restart, if you so wished. But doing that as part of the model configuration is a no-no in my opinion.

franciscojavierarceo · 2025-10-16T15:10:26Z

I personally agree with @ashwinb here, which is why I didn't want to leave the previous approach for long.

This is a very consequential operation. If you are doing it willy nilly without even a rollover of your servers, you are likely not considering downstream effects.

That's the key piece. We make the UX much better for OpenAI compatibility but changing the default feels worthy of a new deploy (making sure our actual users test and verify through their CI).

jgarciao · 2025-10-16T15:36:32Z

llama_stack/distributions/starter-gpu/run.yaml

 server:
  port: 8321
+vector_stores:
+  default_embedding_model_id: sentence-transformers/nomic-ai/nomic-embed-text-v1.5


Given that the models section is empty in this particular run.yaml, does it make sense to set default_embedding_model_id here? I think it would be better to set it in llama-stack/llama_stack/distributions/meta-reference-gpu/run.yaml

Also, given this model definition:

models: - metadata: embedding_dimension: 768 model_id: nomic-embed-text-v1.5 provider_id: sentence-transformers model_type: embedding

Shouldn't this be the correct value to set?

vector_stores: default_embedding_model_id: nomic-embed-text-v1.5

no we updated to require the provider name explicitly in #3822 and nomic is a prefix for HF.

So, the value that should be added is provider_id/provider_model_id?

If yes, as model_id doesn't contain this info, consider calling the new configuration parameter differently (e.g. default_embedding_model). Thanks!

yeah good point. this is basically the identifier we use as the key in the registry. it is all kinds of confusing. @franciscojavierarceo maybe this key should be (provider_id, model_id) tuple instead to avoid all confusion.

sgtm i'll update.

ehhuang · 2025-10-16T19:09:11Z

llama_stack/providers/utils/memory/openai_vector_store_mixin.py

        self.files_api = files_api
        self.kvstore = kvstore
-        self.models_api = models_api
+        # These will be set by implementing classes


why not add to init?

Signed-off-by: Francisco Javier Arceo <[email protected]> # Conflicts: # .github/workflows/integration-vector-io-tests.yml # llama_stack/distributions/ci-tests/run.yaml # llama_stack/distributions/starter-gpu/run.yaml # llama_stack/distributions/starter/run.yaml # llama_stack/distributions/template.py # llama_stack/providers/utils/memory/openai_vector_store_mixin.py

Signed-off-by: Francisco Javier Arceo <[email protected]>

Signed-off-by: Francisco Javier Arceo <[email protected]> # Conflicts: # llama_stack/providers/utils/memory/openai_vector_store_mixin.py

Signed-off-by: Francisco Javier Arceo <[email protected]>

leseb · 2025-10-17T07:38:58Z

This might need some more consideration, but at first glance, the main downside is that changing the value requires a server restart, while the main advantage is that it’s GitOps-friendly. (compared to default_configured being in models)

@leseb I think that's a feature in my mind. This is a very consequential operation. If you are doing it willy nilly without even a rollover of your servers, you are likely not considering downstream effects.

Also, note that you can later build APIs on top of this (just like a proposed POST /providers) which are admin-scoped and then don't require a server restart, if you so wished. But doing that as part of the model configuration is a no-no in my opinion.

It's all good! Maybe I wasn't clear but I prefer the GitOps approach as I have already commented on the POST /providers proposal. No blockers from me. Thanks!

Signed-off-by: Francisco Javier Arceo <[email protected]>

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 15, 2025

franciscojavierarceo force-pushed the stack-config-default-embed branch from 0249074 to 6a80f7a Compare October 16, 2025 13:41

franciscojavierarceo marked this pull request as ready for review October 16, 2025 14:05

franciscojavierarceo requested review from ashwinb, bbrowning, ehhuang, hardikjshah, leseb, mattf, raghotham, reluctantfuturist, slekkala1, terrytangyuan and yanxi0830 as code owners October 16, 2025 14:05

ashwinb reviewed Oct 16, 2025

View reviewed changes

leseb reviewed Oct 16, 2025

View reviewed changes

jgarciao reviewed Oct 16, 2025

View reviewed changes

ehhuang reviewed Oct 16, 2025

View reviewed changes

franciscojavierarceo added 7 commits October 16, 2025 23:35

update resolver to only pass vector_stores section of run config

f91a6b1

Signed-off-by: Francisco Javier Arceo <[email protected]>

Using Router only

4606b12

Signed-off-by: Francisco Javier Arceo <[email protected]> # Conflicts: # llama_stack/providers/utils/memory/openai_vector_store_mixin.py

removing model_api

c4e681b

Signed-off-by: Francisco Javier Arceo <[email protected]>

fixing tests

8a23fe3

Signed-off-by: Francisco Javier Arceo <[email protected]>

updating integration tests

edd5b96

Signed-off-by: Francisco Javier Arceo <[email protected]>

update filtering logic

6eac571

Signed-off-by: Francisco Javier Arceo <[email protected]>

franciscojavierarceo force-pushed the stack-config-default-embed branch from faedfae to 6eac571 Compare October 17, 2025 03:40

franciscojavierarceo mentioned this pull request Oct 17, 2025

chore: Updating Vector IO integration tests to use llama stack build #3836

Open

franciscojavierarceo force-pushed the stack-config-default-embed branch 2 times, most recently from 85b9464 to 39a7ea9 Compare October 17, 2025 20:41

adding back relevant vector_db files

48ab1a2

Signed-off-by: Francisco Javier Arceo <[email protected]>

franciscojavierarceo force-pushed the stack-config-default-embed branch from 39a7ea9 to 48ab1a2 Compare October 17, 2025 21:00

fix tests

0b60ff3

Signed-off-by: Francisco Javier Arceo <[email protected]>

chore: Updating how default embedding model is set in stack #3818

Are you sure you want to change the base?

chore: Updating how default embedding model is set in stack #3818

Conversation

franciscojavierarceo commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test Plan

Uh oh!

ashwinb commented Oct 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leseb left a comment

Choose a reason for hiding this comment

Uh oh!

ashwinb commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

franciscojavierarceo commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leseb commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

franciscojavierarceo commented Oct 15, 2025 •

edited

Loading

ashwinb commented Oct 16, 2025 •

edited

Loading

franciscojavierarceo commented Oct 16, 2025 •

edited

Loading

leseb commented Oct 17, 2025 •

edited

Loading