feat(upstream): filter nodes in upstream with metadata #12448

jizhuozhi · 2025-07-20T05:57:24Z

Description

This PR introduces metadata-based node filtering for upstreams, supporting both static and service discovery-based upstreams (e.g., via Consul).

Motivation

Currently, APISIX selects upstream nodes based on service name from discovery without additional filtering logic. In real-world scenarios like canary release or swimlane routing, users often tag backend instances with custom metadata (e.g., version, env) and expect the gateway to route only to specific subsets.

This change allows users to define a metadata_match field in upstream configuration, which filters nodes before load balancing based on their metadata values.

Changes

Consul discovery: Include Service.Meta in the node definition and respect its weight if available.
Upstream core logic: Add metadata_match filtering before computing upstream nodes.
Tests: Add test cases to cover both:
- static upstream with metadata_match
- discovery-based upstream (Consul) with metadata_match

Example Usage

upstream:
  type: roundrobin
  nodes:
    "127.0.0.1:1980":
      weight: 1
      metadata:
        version: v1
    "127.0.0.2:1980":
      weight: 1
      metadata:
        version: v2
  metadata_match:
    lane:
      - prod
    version:
      - v1
      - v2

Only nodes with metadata.lane in [prod] and metadata.version in [v1, v2] will be used for load balancing.

Fixes

Fixes # (please link to the issue or describe use case clearly if not yet filed)

Checklist

I have explained the need for this PR and the problem it solves
I have explained the changes or the new features added to this PR
I have added tests corresponding to this change
I have updated the documentation to reflect this change
I have verified that this change is backward compatible (no breaking changes)

Baoyuantop · 2025-07-21T02:43:23Z

Hi @jizhuozhi, thanks for your contribution.

I think it is useful for node filtering of Consul discovery. But I don't understand static upstream filtering. It seems that I need to mark metadata for each node in the upstream object, and then use metadata_match to configure filtering? Because each node is manually defined, if it is not needed, can I just add or delete the node?

jizhuozhi · 2025-07-21T02:47:08Z

Hi @jizhuozhi, thanks for your contribution.

I think it is useful for node filtering of Consul discovery. But I don't understand static upstream filtering. It seems that I need to mark metadata for each node in the upstream object, and then use metadata_match to configure filtering? Because each node is manually defined, if it is not needed, can I just add or delete the node?

Here is a unified approach: I only determine whether there is a filtering rule, without distinguishing whether it is a service discovery or static list.

However, according to my previous experience as a gateway administrator, there will be corresponding business developers who temporarily add rules for some debugging considerations but do not want to change the original instance list (for quick adjustment)

Baoyuantop · 2025-07-21T03:04:46Z

there will be corresponding business developers who temporarily add rules for some debugging considerations but do not want to change the original instance list

Thanks for your reply. Could you please describe the scenario in detail? Why can't the existing methods solve this problem?

Copilot

Pull Request Overview

This PR adds metadata-based node filtering capabilities to APISIX upstreams, allowing users to filter backend nodes based on custom metadata values before load balancing. This enables advanced routing scenarios like canary releases and swimlane routing.

Implements metadata_match filtering logic in upstream core module
Enhances Consul discovery to include service metadata and support weight configuration
Adds comprehensive test coverage for both static and service discovery-based upstreams

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
apisix/upstream.lua	Adds metadata filtering functions and integration into upstream node selection
apisix/discovery/consul/init.lua	Updates Consul discovery to include service metadata and weight from Meta field
t/node/upstream-discovery.t	Adds test case for metadata filtering with mock discovery
t/discovery/consul.t	Adds integration test for metadata filtering with actual Consul discovery

Comments suppressed due to low confidence (3)

apisix/upstream.lua:535

The weight from metadata should be converted to a number. If metadata.weight is a string (which is common in Consul Meta fields), this could cause issues with load balancing calculations that expect numeric weights.

                return nil, "failed to fetch ssl info by "

t/discovery/consul.t:810

The test endpoint /test_metadata_match returns the server address but doesn't actually proxy to the filtered upstream nodes. This test validates the configuration but doesn't verify that the metadata filtering is working correctly since it's not actually routing through the filtered nodes.

    content_by_lua_block {

t/node/upstream-discovery.t:530

The test configuration shows metadata_match as a simple key-value pair (version: v2), but the implementation expects an array format. This test will not properly validate the metadata filtering functionality due to this structural mismatch.

        version: v2

Copilot · 2025-07-21T08:31:55Z

apisix/upstream.lua

+        end
+        core.log.info("new_nodes: ", core.json.delay_encode(new_nodes, true))


This info-level log statement will be printed on every request and could generate excessive logs in production. Consider using debug level or adding a condition to only log when metadata filtering actually occurs.

Suggested change

end

core.log.info("new_nodes: ", core.json.delay_encode(new_nodes, true))

core.log.info("new_nodes: ", core.json.delay_encode(new_nodes, true))

end

Copilot · 2025-07-21T08:31:55Z

apisix/upstream.lua

@@ -310,6 +339,14 @@ function _M.set_by_route(route, api_ctx)
            return HTTP_CODE_UPSTREAM_UNAVAILABLE, "no valid upstream node: " .. (err or "nil")
        end

+        if up_conf.metadata_match then


The metadata_match field is used directly without validation of its structure. The code expects it to be an array of objects with 'key' and 'allowed_vals' fields, but the test shows it as a simple key-value object. This mismatch will cause the filtering to fail.

Suggested change

if up_conf.metadata_match then

if up_conf.metadata_match then

-- Validate and transform metadata_match into the expected format

if type(up_conf.metadata_match) == "table" and not (#up_conf.metadata_match > 0) then

local transformed_metadata = {}

for key, allowed_vals in pairs(up_conf.metadata_match) do

if type(allowed_vals) ~= "table" then

return HTTP_CODE_UPSTREAM_UNAVAILABLE, "invalid metadata_match format: allowed_vals must be a table"

end

table.insert(transformed_metadata, { key = key, allowed_vals = allowed_vals })

end

up_conf.metadata_match = transformed_metadata

elseif type(up_conf.metadata_match) ~= "table" or not (#up_conf.metadata_match > 0) then

return HTTP_CODE_UPSTREAM_UNAVAILABLE, "invalid metadata_match format: must be an array of objects with 'key' and 'allowed_vals'"

end

jizhuozhi · 2025-07-21T11:21:47Z

Thanks for your reply. Could you please describe the scenario in detail? Why can't the existing methods solve this problem?

It is not a production environment, but it is common in the testing and verification phase. We need to specify specific instances frequently (for example, to capture flame graphs for performance analysis), but we need to add other instances back after deleting them, so we need to specify instances by filtering.

In fact, we also matched according to the dynamic colored metadata when loading balancing but not predefine the routes, similar to https://github.com/kitex-contrib/loadbalance-tagging (I am also using lua to implement the same capabilities, but this is not within the scope of this discussion).

Baoyuantop · 2025-07-22T05:59:30Z

Thanks for your reply. Could you please describe the scenario in detail? Why can't the existing methods solve this problem?

It is not a production environment, but it is common in the testing and verification phase. We need to specify specific instances frequently (for example, to capture flame graphs for performance analysis), but we need to add other instances back after deleting them, so we need to specify instances by filtering.

In fact, we also matched according to the dynamic colored metadata when loading balancing but not predefine the routes, similar to https://github.com/kitex-contrib/loadbalance-tagging (I am also using lua to implement the same capabilities, but this is not within the scope of this discussion).

I still have doubts about what is in Example Usage. Do you mean that if I need to adjust the nodes used, I don't need to change the content of the nodes list, but adjust metadata_match?
By the way, are you using static nodes or service discovery?

jizhuozhi · 2025-07-22T06:20:00Z

I still have doubts about what is in Example Usage. Do you mean that if I need to adjust the nodes used, I don't need to change the content of the nodes list, but adjust metadata_match?

Yes, just adjust metadata_match (but the discussion of this use case has been separated from this PR). For the runtime, it is a unified filtering rule for the service list that does not need to distinguish the source.

By the way, are you using static nodes or service discovery?

We are currently using Consul on kubernetes. When I was working in another company a few years ago, we were using cloud virtual machines (or EC2). The cloud platform did not provide an API interface, but we used scripts to synchronize static instance lists at regular intervals. At this time, the static list was also a kind of dynamic discovery. (why not filter in the script? Because we were lazy:)

feat(upstream): filter nodes in upstream with metadata

b066a7e

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Jul 20, 2025

jizhuozhi mentioned this pull request Jul 21, 2025

feat(nacos): add metadata filtering support to nacos discovery #12445

Open

5 tasks

moonming requested a review from Copilot July 21, 2025 08:31

Copilot AI reviewed Jul 21, 2025

View reviewed changes

fix lint

19623d7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(upstream): filter nodes in upstream with metadata #12448

feat(upstream): filter nodes in upstream with metadata #12448

jizhuozhi commented Jul 20, 2025 •

edited

Loading

Uh oh!

Baoyuantop commented Jul 21, 2025

Uh oh!

jizhuozhi commented Jul 21, 2025

Uh oh!

Baoyuantop commented Jul 21, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 21, 2025

Uh oh!

Copilot AI Jul 21, 2025

Uh oh!

jizhuozhi commented Jul 21, 2025 •

edited

Loading

Uh oh!

Baoyuantop commented Jul 22, 2025 •

edited

Loading

Uh oh!

jizhuozhi commented Jul 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

		end
		core.log.info("new_nodes: ", core.json.delay_encode(new_nodes, true))

-        if up_conf.metadata_match then
+        if up_conf.metadata_match then
+            -- Validate and transform metadata_match into the expected format
+            if type(up_conf.metadata_match) == "table" and not (#up_conf.metadata_match > 0) then
+                local transformed_metadata = {}
+                for key, allowed_vals in pairs(up_conf.metadata_match) do
+                    if type(allowed_vals) ~= "table" then
+                        return HTTP_CODE_UPSTREAM_UNAVAILABLE, "invalid metadata_match format: allowed_vals must be a table"
+                    end
+                    table.insert(transformed_metadata, { key = key, allowed_vals = allowed_vals })
+                end
+                up_conf.metadata_match = transformed_metadata
+            elseif type(up_conf.metadata_match) ~= "table" or not (#up_conf.metadata_match > 0) then
+                return HTTP_CODE_UPSTREAM_UNAVAILABLE, "invalid metadata_match format: must be an array of objects with 'key' and 'allowed_vals'"
+            end

feat(upstream): filter nodes in upstream with metadata #12448

Are you sure you want to change the base?

feat(upstream): filter nodes in upstream with metadata #12448

Conversation

jizhuozhi commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation

Changes

Example Usage

Fixes

Checklist

Uh oh!

Baoyuantop commented Jul 21, 2025

Uh oh!

jizhuozhi commented Jul 21, 2025

Uh oh!

Baoyuantop commented Jul 21, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

jizhuozhi commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Baoyuantop commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jizhuozhi commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jizhuozhi commented Jul 20, 2025 •

edited

Loading

jizhuozhi commented Jul 21, 2025 •

edited

Loading

Baoyuantop commented Jul 22, 2025 •

edited

Loading

jizhuozhi commented Jul 22, 2025 •

edited

Loading