-
Notifications
You must be signed in to change notification settings - Fork 2.6k
feat(upstream): filter nodes in upstream with metadata #12448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Hi @jizhuozhi, thanks for your contribution. I think it is useful for node filtering of Consul discovery. But I don't understand static upstream filtering. It seems that I need to mark metadata for each node in the upstream object, and then use metadata_match to configure filtering? Because each node is manually defined, if it is not needed, can I just add or delete the node? |
Here is a unified approach: I only determine whether there is a filtering rule, without distinguishing whether it is a service discovery or static list. However, according to my previous experience as a gateway administrator, there will be corresponding business developers who temporarily add rules for some debugging considerations but do not want to change the original instance list (for quick adjustment) |
Thanks for your reply. Could you please describe the scenario in detail? Why can't the existing methods solve this problem? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds metadata-based node filtering capabilities to APISIX upstreams, allowing users to filter backend nodes based on custom metadata values before load balancing. This enables advanced routing scenarios like canary releases and swimlane routing.
- Implements
metadata_match
filtering logic in upstream core module - Enhances Consul discovery to include service metadata and support weight configuration
- Adds comprehensive test coverage for both static and service discovery-based upstreams
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
apisix/upstream.lua | Adds metadata filtering functions and integration into upstream node selection |
apisix/discovery/consul/init.lua | Updates Consul discovery to include service metadata and weight from Meta field |
t/node/upstream-discovery.t | Adds test case for metadata filtering with mock discovery |
t/discovery/consul.t | Adds integration test for metadata filtering with actual Consul discovery |
Comments suppressed due to low confidence (3)
apisix/upstream.lua:535
- The weight from metadata should be converted to a number. If metadata.weight is a string (which is common in Consul Meta fields), this could cause issues with load balancing calculations that expect numeric weights.
return nil, "failed to fetch ssl info by "
t/discovery/consul.t:810
- The test endpoint
/test_metadata_match
returns the server address but doesn't actually proxy to the filtered upstream nodes. This test validates the configuration but doesn't verify that the metadata filtering is working correctly since it's not actually routing through the filtered nodes.
content_by_lua_block {
t/node/upstream-discovery.t:530
- The test configuration shows
metadata_match
as a simple key-value pair (version: v2
), but the implementation expects an array format. This test will not properly validate the metadata filtering functionality due to this structural mismatch.
version: v2
apisix/upstream.lua
Outdated
end | ||
core.log.info("new_nodes: ", core.json.delay_encode(new_nodes, true)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This info-level log statement will be printed on every request and could generate excessive logs in production. Consider using debug level or adding a condition to only log when metadata filtering actually occurs.
end | |
core.log.info("new_nodes: ", core.json.delay_encode(new_nodes, true)) | |
core.log.info("new_nodes: ", core.json.delay_encode(new_nodes, true)) | |
end |
Copilot uses AI. Check for mistakes.
@@ -310,6 +339,14 @@ function _M.set_by_route(route, api_ctx) | |||
return HTTP_CODE_UPSTREAM_UNAVAILABLE, "no valid upstream node: " .. (err or "nil") | |||
end | |||
|
|||
if up_conf.metadata_match then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metadata_match field is used directly without validation of its structure. The code expects it to be an array of objects with 'key' and 'allowed_vals' fields, but the test shows it as a simple key-value object. This mismatch will cause the filtering to fail.
if up_conf.metadata_match then | |
if up_conf.metadata_match then | |
-- Validate and transform metadata_match into the expected format | |
if type(up_conf.metadata_match) == "table" and not (#up_conf.metadata_match > 0) then | |
local transformed_metadata = {} | |
for key, allowed_vals in pairs(up_conf.metadata_match) do | |
if type(allowed_vals) ~= "table" then | |
return HTTP_CODE_UPSTREAM_UNAVAILABLE, "invalid metadata_match format: allowed_vals must be a table" | |
end | |
table.insert(transformed_metadata, { key = key, allowed_vals = allowed_vals }) | |
end | |
up_conf.metadata_match = transformed_metadata | |
elseif type(up_conf.metadata_match) ~= "table" or not (#up_conf.metadata_match > 0) then | |
return HTTP_CODE_UPSTREAM_UNAVAILABLE, "invalid metadata_match format: must be an array of objects with 'key' and 'allowed_vals'" | |
end |
Copilot uses AI. Check for mistakes.
It is not a production environment, but it is common in the testing and verification phase. We need to specify specific instances frequently (for example, to capture flame graphs for performance analysis), but we need to add other instances back after deleting them, so we need to specify instances by filtering. In fact, we also matched according to the dynamic colored metadata when loading balancing but not predefine the routes, similar to https://github.com/kitex-contrib/loadbalance-tagging (I am also using lua to implement the same capabilities, but this is not within the scope of this discussion). |
I still have doubts about what is in Example Usage. Do you mean that if I need to adjust the nodes used, I don't need to change the content of the nodes list, but adjust metadata_match? |
Yes, just adjust metadata_match (but the discussion of this use case has been separated from this PR). For the runtime, it is a unified filtering rule for the service list that does not need to distinguish the source.
We are currently using Consul on kubernetes. When I was working in another company a few years ago, we were using cloud virtual machines (or EC2). The cloud platform did not provide an API interface, but we used scripts to synchronize static instance lists at regular intervals. At this time, the static list was also a kind of dynamic discovery. (why not filter in the script? Because we were lazy:) |
Description
This PR introduces metadata-based node filtering for upstreams, supporting both static and service discovery-based upstreams (e.g., via Consul).
Motivation
Currently, APISIX selects upstream nodes based on service name from discovery without additional filtering logic. In real-world scenarios like canary release or swimlane routing, users often tag backend instances with custom metadata (e.g.,
version
,env
) and expect the gateway to route only to specific subsets.This change allows users to define a
metadata_match
field in upstream configuration, which filters nodes before load balancing based on their metadata values.Changes
Consul discovery: Include
Service.Meta
in the node definition and respect itsweight
if available.Upstream core logic: Add
metadata_match
filtering before computing upstream nodes.Tests: Add test cases to cover both:
metadata_match
metadata_match
Example Usage
Only nodes with
metadata.lane in [prod] and metadata.version in [v1, v2]
will be used for load balancing.Fixes
Fixes # (please link to the issue or describe use case clearly if not yet filed)
Checklist