Skip to content

Conversation

zeeshanlakhani
Copy link
Collaborator

@zeeshanlakhani zeeshanlakhani commented Sep 25, 2025

This work introduces multicast IP pool capabilities to support external multicast traffic routing through the rack's switching infrastructure.

Includes:

  • Add IpPoolType enum (unicast/multicast) with unicast as default
  • Add database migration (multicast-support/up01.sql) with new columns and indexes
  • Add ASM/SSM range validation for multicast pools to prevent mixing
  • Add pool type-aware resolution for IP allocation
  • Add custom deserializer for switch port uplinks with deduplication
  • Update external API params/views for multicast pool configuration
  • Add SSM constants (IPV4_SSM_SUBNET, IPV6_SSM_FLAG_FIELD) for validation

Database schema updates:

  • ip_pool table: pool_type
  • Index on pool_type for efficient filtering
  • Migration preserves existing pools as unicast type by default

This provides the foundation for multicast group functionality while maintaining full backward compatibility with existing unicast pools.

References (for review):

TODOs:

This work introduces multicast IP pool capabilities to support external
multicast traffic routing through the rack's switching infrastructure.

Includes:
  - Add IpPoolType enum (unicast/multicast) with unicast as default
  - Add multicast pool fields: switch_port_uplinks (UUID[]), mvlan (VLAN ID)
  - Add database migration (multicast-support/up01.sql) with new columns and indexes
  - Add ASM/SSM range validation for multicast pools to prevent mixing
  - Add pool type-aware resolution for IP allocation
  - Add custom deserializer for switch port uplinks with deduplication
  - Update external API params/views for multicast pool configuration
  - Add SSM constants (IPV4_SSM_SUBNET, IPV6_SSM_FLAG_FIELD) for validation

Database schema updates:
  - ip_pool table: pool_type, switch_port_uplinks, mvlan columns
  - Index on pool_type for efficient filtering
  - Migration preserves existing pools as unicast type by default

This provides the foundation for multicast group functionality while
maintaining full backward compatibility with existing unicast pools.

References (for review):
  - RFD 488: https://rfd.shared.oxide.computer/rfd/488
  - Dendrite PRs (based on recency):
    * oxidecomputer/dendrite#132
    * oxidecomputer/dendrite#109
    * oxidecomputer/dendrite#14
Cargo.toml Outdated
dns-server = { path = "dns-server" }
dns-server-api = { path = "dns-server-api" }
dns-service-client = { path = "clients/dns-service-client" }
dpd-client = { git = "https://github.com/oxidecomputer/dendrite", rev = "738c80d18d5e94eda367440ade7743e9d9f124de" }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

picks up current dpd-client HEAD commit

zeeshanlakhani added a commit that referenced this pull request Sep 25, 2025
Introduces end-to-end multicast group support across control plane and sled-agent, integrated with IP pool extensions required
for supporting multicast workflows. This work enables project-scoped multicast groups with lifecycle-driven dataplane programming
and exposes an API for operating multicast groups over instances.

Highlights:
  - DB: new multicast_group tables; member lifecycle management
  - API: multicast group/member CRUD; source IP validation; VPC/project hierarchy integration with default VNI fallback
  - Control plane: RPW reconcilers for groups/members; sagas for dataplane updates atomically at the group level; instance lifecycle hooks and piggybacking
  - Dataplane: Dendrite DPD switch programming via trait abstraction; DPD client used in tests
  - Sled agent: multicast-aware instance management; network interface configuration for multicast traffic; cross-version testing; OPTE stubs present
  - Tests: comprehensive integration suites under nexus/tests/integration_tests/multicast/

Components:
  - Database schema: external and underlay multicast groups; member/instance association tables
  - Control plane modules: multicast group management, member lifecycle, dataplane abstraction; RPW reconcilers to ensure convergence
  - API layer: endpoints and validation; default-VNI semantics when VPC not provided
  - Sled agent: OPTE stubs and compatibility shims for older agents

Workflows Implemented:
  1. Instance lifecycle integration:

     - "Create" -> resolve VPC/VNI (or default), validate source IPs, create memberships, enqueue group ensure RPW
     - "Start" -> program dataplane via ensure/update sagas; activate member flows after switch ack
     - "Stop" -> deactivate dataplane membership; retain DB membership for fast restart
     - "Delete" -> remove instance memberships; group deletion is explicit
     - "Migrate" -> deactivate on source sled; activate on target; idempotent with ordering guarantees
     - Restart/recovery -> RPWs reconcile desired state; compensations clean up partial programming

  2. RPW reconciliation:

     - ensure dataplane switches match database state
     - handle sled migrations and state transitions
     - Eventual consistency with retry logic

Migrations:
  - Apply schema changes in schema/crdb/multicast-support/up01.sql (and update dbinit.sql)
  - Bump schema versions accordingly

API/Compatibility:
  - OpenAPI updated: openapi/nexus.json, openapi/sled-agent/sled-agent-5.0.0-89f1f7.json
  - Regenerate clients where applicable

References:
  - RFD 488: https://rfd.shared.oxide.computer/rfd/488
  - IP Pool extensions: #9084
  - Dendrite PRs (based on recency):
    * oxidecomputer/dendrite#132
    * oxidecomputer/dendrite#109
    * oxidecomputer/dendrite#14

Follow-ups include:
  - OPTE integration
  - commtest extension
  - omdb commands are tracked in issues
  - pool and group stats
@zeeshanlakhani zeeshanlakhani self-assigned this Sep 25, 2025
zeeshanlakhani added a commit that referenced this pull request Sep 25, 2025
Introduces end-to-end multicast group support across control plane and sled-agent, integrated with IP pool extensions required
for supporting multicast workflows. This work enables project-scoped multicast groups with lifecycle-driven dataplane programming
and exposes an API for operating multicast groups over instances.

Highlights:
  - DB: new multicast_group tables; member lifecycle management
  - API: multicast group/member CRUD; source IP validation; VPC/project hierarchy integration with default VNI fallback
  - Control plane: RPW reconcilers for groups/members; sagas for dataplane updates atomically at the group level; instance lifecycle hooks and piggybacking
  - Dataplane: Dendrite DPD switch programming via trait abstraction; DPD client used in tests
  - Sled agent: multicast-aware instance management; network interface configuration for multicast traffic; cross-version testing; OPTE stubs present
  - Tests: comprehensive integration suites under nexus/tests/integration_tests/multicast/

Components:
  - Database schema: external and underlay multicast groups; member/instance association tables
  - Control plane modules: multicast group management, member lifecycle, dataplane abstraction; RPW reconcilers to ensure convergence
  - API layer: endpoints and validation; default-VNI semantics when VPC not provided
  - Sled agent: OPTE stubs and compatibility shims for older agents

Workflows Implemented:
  1. Instance lifecycle integration:

     - "Create" -> resolve VPC/VNI (or default), validate source IPs, create memberships, enqueue group ensure RPW
     - "Start" -> program dataplane via ensure/update sagas; activate member flows after switch ack
     - "Stop" -> deactivate dataplane membership; retain DB membership for fast restart
     - "Delete" -> remove instance memberships; group deletion is explicit
     - "Migrate" -> deactivate on source sled; activate on target; idempotent with ordering guarantees
     - Restart/recovery -> RPWs reconcile desired state; compensations clean up partial programming

  2. RPW reconciliation:

     - ensure dataplane switches match database state
     - handle sled migrations and state transitions
     - Eventual consistency with retry logic

Migrations:
  - Apply schema changes in schema/crdb/multicast-group-support/up01.sql (and update dbinit.sql)
  - Bump schema versions accordingly

API/Compatibility:
  - OpenAPI updated: openapi/nexus.json, openapi/sled-agent/sled-agent-5.0.0-89f1f7.json
  - Contains a version change (to v5) as InstanceEnsureBody has been modified to
    include multicast_groups associated with an instance in the
    underlying sled config
  - Regenerate clients where applicable

References:
  - RFD 488: https://rfd.shared.oxide.computer/rfd/488
  - IP Pool extensions: #9084
  - Dendrite PRs (based on recency):
    * oxidecomputer/dendrite#132
    * oxidecomputer/dendrite#109
    * oxidecomputer/dendrite#14

Follow-ups include:
  - OPTE integration
  - commtest extension
  - omdb commands are tracked in issues
  - pool and group stats
@zeeshanlakhani zeeshanlakhani changed the title [feat] Add IP pool multicast support [feat, multicast] Add IP pool multicast support Sep 25, 2025
zeeshanlakhani added a commit that referenced this pull request Sep 26, 2025
Introduces end-to-end multicast group support across control plane and sled-agent, integrated with IP pool extensions required
for supporting multicast workflows. This work enables project-scoped multicast groups with lifecycle-driven dataplane programming
and exposes an API for operating multicast groups over instances.

Highlights:
  - DB: new multicast_group tables; member lifecycle management
  - API: multicast group/member CRUD; source IP validation; VPC/project hierarchy integration with default VNI fallback
  - Control plane: RPW reconcilers for groups/members; sagas for dataplane updates atomically at the group level; instance lifecycle hooks and piggybacking
  - Dataplane: Dendrite DPD switch programming via trait abstraction; DPD client used in tests
  - Sled agent: multicast-aware instance management; network interface configuration for multicast traffic; cross-version testing; OPTE stubs present
  - Tests: comprehensive integration suites under nexus/tests/integration_tests/multicast/

Components:
  - Database schema: external and underlay multicast groups; member/instance association tables
  - Control plane modules: multicast group management, member lifecycle, dataplane abstraction; RPW reconcilers to ensure convergence
  - API layer: endpoints and validation; default-VNI semantics when VPC not provided
  - Sled agent: OPTE stubs and compatibility shims for older agents

Workflows Implemented:
  1. Instance lifecycle integration:

     - "Create" -> resolve VPC/VNI (or default), validate source IPs, create memberships, enqueue group ensure RPW
     - "Start" -> program dataplane via ensure/update sagas; activate member flows after switch ack
     - "Stop" -> deactivate dataplane membership; retain DB membership for fast restart
     - "Delete" -> remove instance memberships; group deletion is explicit
     - "Migrate" -> deactivate on source sled; activate on target; idempotent with ordering guarantees
     - Restart/recovery -> RPWs reconcile desired state; compensations clean up partial programming

  2. RPW reconciliation:

     - ensure dataplane switches match database state
     - handle sled migrations and state transitions
     - Eventual consistency with retry logic

Migrations:
  - Apply schema changes in schema/crdb/multicast-group-support/up01.sql (and update dbinit.sql)
  - Bump schema versions accordingly

API/Compatibility:
  - OpenAPI updated: openapi/nexus.json, openapi/sled-agent/sled-agent-5.0.0-89f1f7.json
  - Contains a version change (to v5) as InstanceEnsureBody has been modified to
    include multicast_groups associated with an instance in the
    underlying sled config
  - Regenerate clients where applicable

References:
  - RFD 488: https://rfd.shared.oxide.computer/rfd/488
  - IP Pool extensions: #9084
  - Dendrite PRs (based on recency):
    * oxidecomputer/dendrite#132
    * oxidecomputer/dendrite#109
    * oxidecomputer/dendrite#14

Follow-ups include:
  - OPTE integration
  - commtest extension
  - omdb commands are tracked in issues
  - pool and group stats
/// Type of IP pool (defaults to Unicast for backward compatibility)
#[serde(default)]
pub pool_type: shared::IpPoolType,
/// Rack switch uplinks that carry multicast traffic out of the rack to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this part of an IP pool? IP pools are about managing addresses that can be assigned within the rack. I think managing multicast egress from the rack should be a separate concern.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I discussed with @FelixMcFelix, in that we didn't really know the best place to put this in general. And, to your point @rcgoodfellow, maybe it's part of another (new) resource. Being that we don't do anything with this at the moment, I'll leave some of the helpers, but remove the uplinks from the API.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed mvlan and switch port uplinks from this work.

@zeeshanlakhani
Copy link
Collaborator Author

zeeshanlakhani commented Oct 9, 2025

New changes pushed onto this branch, but the PR is not picking it up: bcb4fc6#diff-6e5a0e9d23e313fccb92594537052a498855670462b49a8748d4ed3611ba4b37 and a merge commit to bring in the schema updates. (it now works)

@zeeshanlakhani zeeshanlakhani force-pushed the zl/ip-pool-multicast-support branch from 0e7ad02 to bcb4fc6 Compare October 9, 2025 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants