29 Sep 11:40

Monokaix

943b8c2

v1.13.0 Latest

Latest

What's New

Welcome to the v1.13.0 release of Volcano! 🚀 🎉 📣
In this release, we have brought a series of significant enhancements that have been long-awaited by community users:

AI Training and Inference Enhancements
Resource Management and Scheduling Enhancements
- Introduce ResourceStrategyFit Plugin
  - Independent Scoring Strategy by Resource Type
  - Scarce Resource Avoidance (SRA)
- Enhance NodeGroup Functionality
Colocation Enhancements
- Decouple Colocation from OS
- Support Custom OverSubscription Resource Names

Support LeaderWorkerSet for Large Model Inference Scenarios

LeaderWorkerSet (LWS) is an API for deploying a group of Pods on Kubernetes. It is primarily used to address multi-host inference in AI/ML inference workloads, especially scenarios that require sharding large language models (LLMs) and running them across multiple devices on multiple nodes.

Since its open-source release, Volcano has actively integrated with upstream and downstream ecosystems, building a comprehensive community ecosystem for batch computing such as AI and big data. In the v0.7 release of LWS, it natively integrated Volcano's AI scheduling capabilities. When used with the new version of Volcano, LWS automatically creates PodGroups, which are then scheduled and managed by Volcano, thereby implementing advanced capabilities like Gang scheduling for large model inference scenarios.

Looking ahead, Volcano will continue to expand its ecosystem integration capabilities, providing robust scheduling and resource management support for more projects dedicated to enabling distributed inference on Kubernetes.

Usage documentation: LeaderWorkerSet With Gang.

Introduce Cron VolcanoJob

This release introduces support for Cron Volcano Jobs. Users can now periodically create and run Volcano Jobs based on a predefined schedule, similar to native Kubernetes CronJobs, to achieve periodic execution of batch computing tasks like AI and big data. Detailed features are as follows:

Scheduled Execution: Define the execution cycle of jobs using standard Cron expressions (spec.schedule).
Timezone Support: Set the timezone in spec.timeZone to ensure jobs execute at the expected local time.
Concurrency Policy: Control concurrent behavior via spec.concurrencyPolicy:
- AllowConcurrent: Allows concurrent execution of multiple jobs (default).
- ForbidConcurrent: Skips the current scheduled execution if the previous job has not completed.
- ReplaceConcurrent: Terminates the previous job if it is still running and starts a new one.
History Management: Configure the number of successful (successfulJobsHistoryLimit) and failed (failedJobsHistoryLimit) job history records to retain; old jobs are automatically cleaned up.
Missed Schedule Handling: The startingDeadlineSeconds field allows tolerating scheduling delays within a certain timeframe; timeouts are considered missed executions.
Status Tracking: The CronJob status (status) tracks currently active jobs, the last scheduled time, and the last successful completion time for easier monitoring and management.

Related PRs: volcano-sh/apis#192, #4560, @GoingCharlie, @hwdef, @Monokaix

Usage example: Cron Volcano Job Example.

Support Label-based HyperNode Auto Discovery

Volcano officially launched network topology-aware scheduling capability in v1.12 and pioneered the UFM auto-discovery mechanism based on InfiniBand (IB) networks. However, for hardware clusters that do not support IB networks or use other network architectures (such as Ethernet), manually maintaining the network topology remains cumbersome.

To address this issue, the new version introduces a Label-based HyperNode auto-discovery mechanism. This feature provides users with a universal and flexible way to describe network topology, transforming complex topology management tasks into simple node label management.

This mechanism allows users to define the correspondence between topology levels and node labels in the volcano-controller-configmap. The Volcano controller periodically scans all nodes in the cluster and automatically performs the following tasks based on their labels:

Automatic Topology Construction: Automatically builds multi-layer HyperNode topology structures from top to bottom (e.g., rack -> switch -> node) based on a set of labels on the nodes.
Dynamic Maintenance: When node labels change, or nodes are added or removed, the controller automatically updates the members and structure of the HyperNodes, ensuring the topology information remains consistent with the cluster state.
Support for Multiple Topology Types: Allows users to define multiple independent network topologies simultaneously to adapt to different hardware clusters (e.g., GPU clusters, NPU clusters) or different network partitions.

Configuration example:

# volcano-controller-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: volcano-controller-configmap
  namespace: volcano-system
data:
  volcano-controller.conf: |
    networkTopologyDiscovery:
      - source: label
        enabled: true
        interval: 10m # Discovery interval
        config:
          networkTopologyTypes:
            # Define a topology type named topology-A
            topology-A:
              # Define topology levels, ordered from top to bottom
              - nodeLabel: "volcano.sh/hypercluster" # Top-level HyperNode
              - nodeLabel: "volcano.sh/hypernode"   # Middle-level HyperNode
              - nodeLabel: "kubernetes.io/hostname" # Bottom-level physical node

This feature is enabled by adding the label source to the Volcano controller's ConfigMap. The above configuration defines a three-layer topology structure named topology-A:

Top Level (Tier 2): Defined by the volcano.sh/hypercluster label.
Middle Level (Tier 1): Defined by the volcano.sh/hypernode label.
Bottom Level: Physical nodes, identified by the Kubernetes built-in kubernetes.io/hostname label.

When a node is labeled as follows, it will be automatically recognized and classified into the topology path cluster-s4 -> node-group-s0:

# Labels for node node-0
labels:
  kubernetes.io/hostname: node-0
  volcano.sh/hypernode: node-group-s0
  volcano.sh/hypercluster: cluster-s4

The label-based network topology auto-discovery feature offers excellent generality and flexibility. It is not dependent on specific network hardware (like IB), making it suitable for various heterogeneous clusters, and allows users to flexibly define hierarchical structures of any depth through labels. It automates complex topology maintenance tasks into simple node label management, significantly reducing operational costs and the risk of errors. Furthermore, this mechanism dynamically adapts to changes in cluster nodes and labels, maintaining the accuracy of topology information in real-time without manual intervention.

Related PR: #4629, @zhaoqi612

Usage documentation: HyperNode Auto Discovery.

Add Native Ray Framework Support

Ray is an open-source unified distributed computing framework whose core goal is to simplify parallel computing from single machines to large-scale clusters, especially suitable for scaling Python and AI applications. To manage and run Ray on Kubernetes, the community provides KubeRay—an operator specifically designed for Kubernetes. It acts as a bridge between Kubernetes and the Ray framework, greatly simplifying the deployment and management of Ray clusters and jobs.

Historically, running Ray workloads on Kubernetes primarily relied on the KubeRay Operator. KubeRay integrated Volcano in its v0.4.0 release (released in 2022) for scheduling and resource management of Ray Clusters, addressing issues like resource deadlocks in distributed training scenarios. With this new version of Volcano, users can now directly create and manage Ray clusters and submit computational tasks through native Volcano Jobs. This provides Ray users with an alternative usage scheme, allowing them to more directly utilize Volcano's capabilities such as Gang Scheduling, queue management and fair scheduling, and job lifecycle management for runni...

Contributors

dongjiang1989, kingeasternsun, and 34 other contributors

Assets 2

14 Aug 01:26

Monokaix

v1.12.2

9b72eac

v1.12.2

What's Changed

Automated cherry pick of #4422: Move kube-scheduler related metrics initilization to server.go to avoid panic by @JesseStutler in #4461
Automated cherry pick of #4473: fix node count reconcile by @Monokaix in #4488
[cherry-pick for 1.12]Fix incorrect definition of ReleaseNameEnvKey by @ouyangshengjia in #4490
[cherry-pick for 1.12]Fix the issue where SelectBestNode returns nil when plugin scores are negative by @guoqinwill in #4472
Automated cherry pick of #4487: Add missing capacity metrics in hierarchical queues by @JesseStutler in #4494
[Cherry-pick] Add bump version script; Make version release more automated by @JesseStutler in #4521
[Cherry-pick] fix: update podGroup when statefulSet update by @Poor12 in #4522
Automated: Bump version to v1.12.2 by @JesseStutler in #4518

Full Changelog: v1.12.1...v1.12.2

Contributors

Monokaix, Poor12, and 3 other contributors

Assets 2

31 May 14:47

Monokaix

v1.12.1

f7acc99

v1.12.1

What's Changed

Fix queue update conflicts when upgrading to new version by @Monokaix in #4336
Bump image to v1.12.1 by @Monokaix in #4337

Full Changelog: v1.12.0...v1.12.1

Contributors

Monokaix

Assets 2

31 May 14:46

Monokaix

v1.12.0

d24c04c

v1.12.0

What's New

Welcome to the v1.12.0 release of Volcano! 🚀 🎉 📣
In this release, we have brought a bunch of significant enhancements that have long-awaited by community users.

Network Topology Aware Scheduling: Alpha Release

Volcano's network topology-aware scheduling, initially introduced as a preview in v1.11, has now reached its Alpha release in v1.12. This feature aims to optimize the deployment of AI tasks in large-scale training and inference scenarios, such as model parallel training and Leader-Worker inference. It achieves this by scheduling tasks within the same network topology performance domain, which reduces cross-switch communication and significantly enhances task efficiency. Volcano leverages the HyperNode CRD to abstract and represent heterogeneous hardware network topologies, supporting a hierarchical structure for simplified management.

Key features integrated in v1.12 include:

HyperNode Auto-Discovery: Volcano now offers automatic discovery of cluster network topologies. Users can configure the discovery type, and the system will automatically create and maintain hierarchical HyperNodes that reflect the actual cluster network topology. Currently, this supports InfiniBand (IB) networks by acquiring topology information via the UFM (Unified Fabric Manager) interface and automatically updating HyperNodes. Future plans include support for more network protocols like RoCE.
Prioritized HyperNode Selection:

This release introduces a scoring strategy based on both node-level and HyperNode-level evaluations, which are accumulated to determine the final HyperNode score.
- Node-level: It is recommended to configure the BinPack plugin to prioritize filling HyperNodes, thereby reducing resource fragmentation.
- HyperNode-level: Lower-level HyperNodes are preferred for better performance due to fewer cross-switch communications. For HyperNodes at the same level, those containing more tasks receive higher scores to reduce HyperNode-level resource fragmentation.
Support for Label Selector Node Matching:

HyperNode leaf nodes are associated with physical nodes in the cluster, supporting three matching strategies:
- Exact Match: Direct matching of node names.
- Regex Match: Matching node names using regular expressions.
- Label Match: Matching nodes via standard Label Selectors.

Related Documentation:

Related PRs: (#3874, #3894, #3969, #3971, #4068, #4213, #3897, #3887, @ecosysbin, @weapons97, @Xu-Wentao,@penggu @JesseStutler, @Monokaix)

Dynamic MIG Slicing for GPU Virtualization

Volcano's GPU virtualization feature now supports requesting partial GPU resources by memory and compute capacity. This, combined with Device Plugin integration, achieves hardware isolation and improves GPU utilization.

Traditional GPU virtualization restricts GPU usage by intercepting CUDA APIs (based on HAMI-Core software solutions). NVIDIA Ampere architecture introduced MIG (Multi-Instance GPU) technology, allowing a single physical GPU to be partitioned into multiple independent instances. However, general MIG solutions often pre-fix instance sizes, leading to resource waste and insufficient flexibility.

Volcano v1.12 provides dynamic MIG slicing and scheduling capabilities. It can select appropriate MIG instance sizes in real-time based on the user's requested GPU usage and employs a Best-Fit algorithm to minimize resource waste. It also supports GPU scoring strategies like BinPack and Spread to reduce resource fragmentation and enhance GPU utilization. Users can request resources using the unified volcano.sh/vgpu-number, volcano.sh/vgpu-cores, and volcano.sh/vgpu-memory APIs without needing to concern themselves with the underlying implementation.

Related Documentation:

Related PRs: (#4290, #3953, @sailorvii, @archlitchi)

Dynamic Resource Allocation (DRA) Support

Kubernetes DRA (Dynamic Resource Allocation) is a built-in Kubernetes feature designed to provide a more flexible and powerful way to manage heterogeneous hardware resources in a cluster, such as GPUs, FPGAs, and high-performance network cards. It addresses the limitations of traditional Device Plugins in certain advanced scenarios, enabling device vendors and platform administrators to better declare, allocate, and share these hardware resources with Pods and containers.

Volcano v1.12 adds support for DRA. This feature allows the cluster to dynamically allocate and manage external resources, enhancing Volcano's integration with the Kubernetes ecosystem and its resource management flexibility.

Related Documentation:
Unified Scheduling with DRA

Related PR: (#3799, @JesseStutler)

Volcano Global Supports Queue Capacity Management

Queues are a fundamental concept in Volcano. To enable tenant quota management in multi-cluster and multi-tenant environments, Volcano v1.12 introduces enhanced global queue capacity management. Users can now centrally limit tenant resource usage across multiple clusters. The configuration remains consistent with single-cluster setups: tenant quotas are defined by setting the capability field within the queue configuration.

Related PR: volcano-sh/volcano-global#16 (@tanberBro)

Security Enhancements

The Volcano community consistently focuses on security. In v1.12, beyond fine-grained control over sensitive permissions like ClusterRole, we've addressed and fixed the following potential security risks:

HTTP Server Timeout Settings: Metric and Healthz endpoints for all Volcano components have been configured with server-side ReadHeader, Read, and Write timeouts. This prevents prolonged resource occupation.
- PR: #4208
Warning Logs for Skipping SSL Certificate Verification: When client requests set insecureSkipVerify to true, a warning log is now added. We strongly advise enabling SSL certificate verification in production environments.
- PR: #4211
Volcano Scheduler pprof Endpoint Disabled by Default: To prevent the disclosure of sensitive program information, the Profiling data port (used for troubleshooting) is now disabled by default.
- PR: #4173
Removal of Unnecessary File Permissions: Unnecessary execution permissions have been removed from Go source files to maintain minimal file permissions.
- PR: #4171
Security Context and Non-Root Execution for Containers: All Volcano components now run with non-root privileges. We've added seccompProfile, SELinuxOptions, and set allowPrivilegeEscalation to false to prevent container privilege escalation. Additionally, only necessary Linux Capabilities are retained, comprehensively limiting container permissions.
- PR: #4207
HTTP Request Response Body Size Limit: For HTTP requests sent by the Extender Plugin and Elastic Search Service, their response body size is now limited. This prevents excessive resource consumption that could lead to OOM (Out Of Memory) issues.
- Disclosure: GHSA-hg79-fw4p-25p8

Performance Improvements in Large-Scale Scenarios

Volcano continuously optimizes performance. The new version, without affecting functionality, has by default removed and disabled some unnecessary Webhooks, improving performance in large-scale batch creation scenarios:

PodGroup Mutating Webhook Disabled by Default: When creating a PodGroup without specifying a queue, the system can now read from the Namespace to populate it. Since this scenario is uncommon, this Webhook is disabled by default. Users can enable it as needed.
Queue Status Validation Moved from Pod to PodGroup: When a queue is closed, task submission is disallowed. The original validation logic was performed during Pod creation. As Volcano's basic scheduling unit is PodGroup, migrating the validation to PodGroup creation is more logical. Since the number of PodGroups is less than Pods, this reduces Webhook calls, improving perfo...

Contributors

xieyanke, co63oc, and 41 other contributors

Assets 2

30 Apr 10:34

Monokaix

v1.11.2

735842a

v1.11.2

Important:
This release addresses multiple critical security vulnerabilities. We strongly advise all users to upgrade to immediately to protect your systems and data.

Security Fixes

[Cherry-pick 1.11] Add http response body size limit (#4252 @kevin-wangzefeng )
[Cherry-pick 1.11] Add security context configuration (#4245 @JesseStutler)
Remove the execute permission for some files, chmod to 644 (#4171 @JesseStutler)
add a switch to control whether enable pprof in scheduler (#4173 @JesseStutler)
Add warning msg when TLS verification disabled(#4211 @Monokaix)
Add http server timeout(#4208 @Monokaix)

Other Improvements

Bump image to v1.11.2 (#4232 @JesseStutler)
Fix: remove controller-manager metrics that should not be introduced (#4202 @dongjiang1989)
Filter useless logs in binpack (#4240 @XbaoWu)

Important Notes Before Upgrading

Change: Volcano Scheduler pprof Endpoint Disabled by Default
For security enhancement, the pprof endpoint for the Volcano Scheduler is now disabled by default in this release. If you require this endpoint for debugging or monitoring, you will need to explicitly enable it post-upgrade. This can be achieved by:

If you are using helm, specifying custom.scheduler_pprof_enable=true during Helm installation or upgrade.
OR, manually setting the command-line argument --enable-pprof=true when starting the Volcano Scheduler.

Please be aware of the security implications before enabling this endpoint in production environments.

Contributors

dongjiang1989, kevin-wangzefeng, and 3 other contributors

Assets 2

30 Apr 10:33

Monokaix

v1.11.0-network-topology-preview.3

6b7b5df

v1.11.0-network-topology-preview.3

Important
This release addresses multiple critical security vulnerabilities. We strongly advise all users to upgrade immediately to protect your systems and data.

Security Fixes

[Cherry-pick network-topology] Add http response body size limit (#4255 @JesseStutler)
[Cherry-pick network-topology] Add security context configuration (#4250 @JesseStutler)
Remove the execute permission for some files, chmod to 644 (#4171 @JesseStutler)
add a switch to control whether enable pprof in scheduler (#4173 @JesseStutler)
Add warning msg when TLS verification disabled(#4211 @Monokaix)
Add http server timeout(#4208 @Monokaix)

Other Improvements

Bump image to v1.11.0-network-topology-preview.3 (#4237 @JesseStutler)
Add NetworkTopology plugin score doc (#4213 @ecosysbin)
HyperNode supports select Nodes By labels (#4068 @ecosysbin)
Update ubuntu base image (#4197 @Monokaix)

Important Notes Before Upgrading

If you are using helm, specifying custom.scheduler_pprof_enable=true during Helm installation or upgrade.
OR, manually setting the command-line argument --enable-pprof=true when starting the Volcano Scheduler.

Please be aware of the security implications before enabling this endpoint in production environments.

Contributors

Monokaix, ecosysbin, and JesseStutler

Assets 2

30 Apr 10:33

Monokaix

v1.10.2

ca2dbb3

v1.10.2

Important:
This release addresses multiple critical security vulnerabilities. We strongly advise all users to upgrade immediately to protect your systems and data.

Security Fixes

[Cherry-pick 1.10] Add http response body size limit (#4253 @kevin-wangzefeng)
[Cherry-pick 1.10] Add security context configuration (#4246 @JesseStutler)
Remove the execute permission for some files, chmod to 644 (#4171 @JesseStutler)
add a switch to control whether enable pprof in scheduler (#4173 @JesseStutler)
Add warning msg when TLS verification disabled(#4211 @Monokaix)
Add http server timeout(#4208 @Monokaix)

Other Improvements

Update ubuntu base image(#4194 @Monokaix)
Bump image to v1.10.2 (#4231 @JesseStutler)

Important Notes Before Upgrading

If you are using helm, specifying custom.scheduler_pprof_enable=true during Helm installation or upgrade.
OR, manually setting the command-line argument --enable-pprof=true when starting the Volcano Scheduler.

Please be aware of the security implications before enabling this endpoint in production environments.

Contributors

kevin-wangzefeng, Monokaix, and JesseStutler

Assets 2

30 Apr 10:32

Monokaix

v1.9.1

e9690aa

v1.9.1

Important:
This release addresses multiple critical security vulnerabilities. We strongly advise all users to upgrade immediately to protect your systems and data.

Security Fixes

[Cherry-pick 1.9] Add http response body size limit (#4254 @JesseStutler)
[Cherry-pick 1.9] Add security context configuration (#4249 @Monokaix)
Remove the execute permission for some files, chmod to 644 (#4171 @JesseStutler)
add a switch to control whether enable pprof in scheduler (#4173 @JesseStutler)
Add warning msg when TLS verification disabled(#4211 @Monokaix)
Add http server timeout(#4208 @Monokaix)

Other Improvements

Bump image to v1.9.1 (#4230 @JesseStutler)
fix panic when get job's elastic resource (#4103 @lowang-bh)
change to action cache v4 (#4075 @Monokaix)
fix flaky test (#4121 @Monokaix)
Supports rollback when allocate callback function fails (#3780 @wangyang0616)
Supports rollback when allocate callback function fails (#3776 @wangyang0616)
fix pg controller create redundancy podGroup when schedulerName isn't matched (#3675 @liuyuanchun11)
Update Kubernetes compatibility (#3570 @Monokaix)
Fix podgroup not created (#3572 @liuyuanchun11)
update pod status when bind error (#3550 @bibibox)
Update NominatedNodeName for pipelined task (#3501 @bibibox)

Important Notes Before Upgrading

If you are using helm, specifying custom.scheduler_pprof_enable=true during Helm installation or upgrade.
OR, manually setting the command-line argument --enable-pprof=true when starting the Volcano Scheduler.

Please be aware of the security implications before enabling this endpoint in production environments.

Contributors

Monokaix, lowang-bh, and 4 other contributors

Assets 2

09 Apr 09:33

Monokaix

v1.11.0-network-topology-preview.2

4324023

v1.11.0-network-topology-preview.2

What's Changed

[cherry-pick]change to action cache v4 by @Monokaix in #4074
[Cherry-pick network-topology] Replace queue status update by using ApplyStatus method & Bump image to v1.11.0-network-topology-preview.2 by @JesseStutler in #4153

Full Changelog: v1.11.0-network-topology-preview.0...v1.11.0-network-topology-preview.2

Contributors

Monokaix and JesseStutler

Assets 2

09 Apr 01:46

Monokaix

v1.11.1

879cdf3

v1.11.1

What's Changed

[cherry-pick]change to action cache v4 by @Monokaix in #4075
[cherry-pick]fix creating a hierarchical sub-queue will be rejected by @zhutong196 in #4080
[cherry-pick] Fix jobflow status confusion problem by @dongjiang1989 in #4094
[cherry-pick] fix: the problem that PVC will be continuously created indefinitely by @ytcisme in #4144
[Cherry-pick v1.11] Replace queue status update by using ApplyStatus method & Bump image to v1.11.1 by @JesseStutler in #4155
[Cherry-pick v1.11] fix: remove lessPartly condition in reclaimable fn from capacity and proportion plugins by @JesseStutler in #4178

Full Changelog: v1.11.0...v1.11.1

Contributors

dongjiang1989, Monokaix, and 3 other contributors

Assets 2

Releases: volcano-sh/volcano

v1.13.0

What's New

Support LeaderWorkerSet for Large Model Inference Scenarios

Introduce Cron VolcanoJob

Support Label-based HyperNode Auto Discovery

Add Native Ray Framework Support

Contributors

Uh oh!

v1.12.2

What's Changed

Contributors

Uh oh!

v1.12.1

What's Changed

Contributors

Uh oh!

v1.12.0

What's New

Network Topology Aware Scheduling: Alpha Release

Dynamic MIG Slicing for GPU Virtualization

Dynamic Resource Allocation (DRA) Support

Volcano Global Supports Queue Capacity Management

Security Enhancements

Performance Improvements in Large-Scale Scenarios

Contributors

Uh oh!

v1.11.2

Security Fixes

Other Improvements

Important Notes Before Upgrading

Contributors

Uh oh!

v1.11.0-network-topology-preview.3

Security Fixes

Other Improvements

Important Notes Before Upgrading

Contributors

Uh oh!

v1.10.2

Security Fixes

Other Improvements

Important Notes Before Upgrading

Contributors

Uh oh!

v1.9.1

Security Fixes

Other Improvements

Important Notes Before Upgrading

Contributors

Uh oh!

v1.11.0-network-topology-preview.2

What's Changed

Contributors

Uh oh!

v1.11.1

What's Changed

Contributors

Uh oh!