Skip to content

Conversation

dmartinol
Copy link
Collaborator

Related to: #1114

A proposal to define a Kubernetes deployment of the ToolHive Registry.

Goals

  • Native Kubernetes Registry: Implement registry functionality using Custom Resource Definitions
  • Upstream Format Support: Leverage existing upstream conversion capabilities for ecosystem compatibility
  • Multi-Registry Support: Support both local registry entries and external registry synchronization.
  • Registry Hierarchy: Support the multi-registry hierarchy defined in the upstream model
  • Application Integration: Provide REST API for programmatic access to registry data
  • GitOps Compatibility: Enable declarative registry management through CRD-based operations

@blkt
Copy link
Contributor

blkt commented Sep 1, 2025

Hey @dmartinol thank you so much for your contribution!
This is massive and it will take a while for us to review, we'll keep you posted.

Copy link
Member

@rdimitrov rdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed proposals! Let me try to give a quick summary of what I got so far:

Changes proposed to ToolHive (thv):

  • Registry API continues to be part of ToolHive’s API server
  • Can register multiple registries (file-based JSON, API, git-backed (still JSON probably), etc.)
  • Search works across all registries
  • ToolHive keeps a cached copy of registries and refreshes them periodically
  • Includes a trusted catalogue, with APIs for submitting and reviewing (approve/deny) MCP servers promotions taken from external registries
  • Templated MCP servers (can you share more about the differences between it and a regular entry in the registry?)

Changes related to Kubernetes:

  • A separate Registry API server is available for others to use
  • New Registry Controller handles MCPRegistry CRDs for different registry types (remote, file-based, etc.) and feeds them to the Registry API (presuming a separate server/service?)
  • It also manages cache refreshes for all configured registries

I'm happy to chat more, but I wanted to make sure I understand the main changes that are being proposed/impacted at first 👍

- Reference another registry's REST API endpoint as a data source
- Enables registry hierarchies and aggregation patterns across clusters
- Supports filtering and transformation of upstream registry data
- Works with any registry implementation that exposes the standard API
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the standard API is the OpenAPI spec of the official MCP registry or is it something else?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works with any registry implementation that exposes the standard API
We can discuss the details, but it should work according to the specified format field, being capable to digest both upstream or toolhive APIs.

```bash
# Add the official MCP community registry
thv registry add community \
--url https://registry.modelcontextprotocol.io/servers.json \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is supposed to be an API (not sure they are planning to have an exported json file with all entries), but I think this covers the use case of having a remotely-hosted json file 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, probably the example is misleading, it should be more generic like --url <REGISTRY_DATA_URL>/servers.json

onUpdate: update
```

### Creating an MCPServer from Registry Data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm if I understand the flow correctly - a client application queries the registry API, discovers and gets the metadata for a given server, generates this CRD and then if it wants to spawn it, it creates the CRD in the k8s cluster so it can get picked up by the existing toolhive operator?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Servers can be deployed manually (with or w/o matching labels) according to the workflow you described, or using the thv run --registry ....
The latter case, in a k8s environment, would deploy an MCPServer instance, instead of today's raw Deployment and StatefulSet (IIRC).

In short, yes, registries are there to simplify the discovery of available servers and the management of their lifecycle (deploy and list, for now).

MCPServerTemplates would go even further, and define some shared settings once for all the next deployed instances, but I'm not sure we need to go to that level of design. Realistically, how many deployments of the same server should we expect within the same environment? I'm scared this would just create some unnecessary overhead, ending with two resources for each deployed server (1 template, 1 server), instead of just 1 (the server).

@dmartinol
Copy link
Collaborator Author

Changes proposed to ToolHive (thv):

Thanks for pointing that out. You’re aiming for consistency across Docker and Kubernetes, while my focus was on the cluster
environment. Let’s discuss whether it makes sense to have some of these functions in both environments

  • Registry API continues to be part of ToolHive’s API server
  • Can register multiple registries (file-based JSON, API, git-backed (still JSON probably), etc.)

Not sure it is really needed for the docker environment. Is it?

  • Search works across all registries
  • ToolHive keeps a cached copy of registries and refreshes them periodically

Same as before. But if the previous answer is yes, then yes.

  • Includes a trusted catalogue, with APIs for submitting and reviewing (approve/deny) MCP servers promotions taken from external registries

Again, in a local, docker environment, does it make any sense? In a production, kubernetes environment, we can create layers of registries with different trust level and define access scopes to prevent untrusted servers from reaching production environments. Not sure this is the case for the local one.

  • Templated MCP servers (can you share more about the differences between it and a regular entry in the registry?)

I think I lost some examples during the latest reviews, but the idea was to have a prefilled MCPServer template allowing to use template parameters to specify the actual values with a dedicated command, as we have for Templates in the openshift.io group. Reference

Then, a dedicated thv command/registry REST API, could create the actual MCPServer instance by speciofying the value for each parameter in the template.
thv process <TEMPLATE> -p PARAM1=VALUE1 ......

The advantage is to specify the repeated configuration sections just once (e.g. the resources section), and use parameters to specify only the server-specific details. This guarantees consistent deployments across the cluster, but as I wrote in another comment, I'm not sure it's even needed. Very low priority for now.

Changes related to Kubernetes:

  • A separate Registry API server is available for others to use
  • New Registry Controller handles MCPRegistry CRDs for different registry types (remote, file-based, etc.) and feeds them to the Registry API (presuming a separate server/service?)
  • It also manages cache refreshes for all configured registries

👍

- **REST API**: HTTP endpoints for programmatic registry and server discovery
- **Authentication**: Integration with Kubernetes RBAC and service account tokens
- **Filtering**: Query servers by registry, category, transport type, and custom labels
- **Format Support**: Return data in both ToolHive and upstream registry formats
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This migth be a silly question, but how much difference is there now between toolhive and upstream registry formats? I thought our goal was to use the upstream registry format with vendor extensions. Does toolhive here mean supporting the extensions? (maybe this is a question for @rdimitrov )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not the main goal of the PR, but I tried to match the previous proposal tracked as upstream-mcp-registry-format-support.md.
My understanding was that thv would keep its proprietary format and use the extensions mechanisms to export using the upstream format (and vice versa), but if this is not the case it would even simplify the PR by removing the format conversions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we talk about the official MCP registry it consist of:

  • an OpenAPI schema for the registry API server (once it goes live this would be publicly available for everyone to use)
  • and a json schema that covers how you describe an MCP server (the so-called server.json)

On the Toolhive side:

  • We are in the process of moving our registry from our format (i.e ImageMetadata) to follow the upstream format (aka server.json).
  • Note that the structure of the actual registry.json file will slightly change too, but this is expected as this part is really specific to ToolHive (there's no community effort around adopting a file representation of a registry catalogue, at least not yet). Here's a preview of the new format - link.
  • The above will set the foundation that would allows us to then add support for the registry API as well as any other compliant registries). In your proposal I think this maps to a remote registry source.
  • Note that ToolHive's API will probably be a superset of this too so other registry clients besides Toolhive can use it.


Declarative operation CRDs for GitOps compatibility:
- `MCPRegistryImportJob`: Declarative import operations
- `MCPRegistryExportJob`: Declarative export operations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the use-case for export?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data backup? Anyway, I agree it can be dropped for now: if the original data source is not mutable from the registry itself (we only import and sync), then it's probably useless.

annotations:
registry.toolhive.io/source: upstream-community
spec:
image: "mcpproject/filesystem-server:latest"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmartinol

Shouldn't this say something like mcpproject/upstream-community/filesystem-server:latest ?

in other words, would this point to filesystem-server from upstream-community in the mcpproject namespace?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmartinol Shouldn't this say something like mcpproject/upstream-community/filesystem-server:latest ?
in other words, would this point to filesystem-server from upstream-community in the mcpproject namespace?

I think it means something like the mcpproject/filesystem-server:latest image in docker.hub (or the default container image). I don't think we want to hold an image registry into the MCPRegistry upstream-community, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry ignore me, I was conflating two things I think. I thought one of the goals was to also, instead of image, to be able to point to a "reference" or "record" from the registry

@dmartinol
Copy link
Collaborator Author

As agreed in previous conversations, I share here a link to a design document for an initial MVP#1, feel free to comment!
MCP Registry MVP-1: Design Proposal

@dmartinol dmartinol marked this pull request as ready for review September 4, 2025 14:54
Copy link

codecov bot commented Sep 5, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 40.06%. Comparing base (a384737) to head (3b3a44e).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1641      +/-   ##
==========================================
+ Coverage   39.98%   40.06%   +0.08%     
==========================================
  Files         180      180              
  Lines       20911    20911              
==========================================
+ Hits         8361     8378      +17     
+ Misses      11938    11918      -20     
- Partials      612      615       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coveralls
Copy link
Collaborator

Coverage Status

coverage: 37.326% (+0.07%) from 37.255%
when pulling 3b3a44e on dmartinol:k8s_registry
into a384737 on stacklok:main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants