Configuring quotas
As an administrator, you can use Alauda Build of Kueue to configure quotas to optimize resource allocation and system throughput for user workloads. You can configure quotas for compute resources such as CPU, memory, pods, and GPU.
You can configure quotas in Alauda Build of Kueue by completing the following steps:
- Configure a cluster queue.
- Configure a resource flavor.
- Configure a local queue.
- Users can then submit their workloads to the local queue.
TOC
1. Configuring a cluster queue
A cluster queue is a cluster-scoped resource, represented by a ClusterQueue object, that governs a pool of resources such as GPU, CPU, memory, and pods. Cluster queues can be used to define usage limits, quotas for resource flavors, order of consumption, and fair sharing rules.
Note: The cluster queue is not ready for use until a ResourceFlavor object has also been configured.
1.1. Prerequisites
- The Alauda Container Platform Web CLI has communication with your cluster.
- You have cluster administrator permissions or the
kueue-batch-admin-rolerole.
1.2. Procedure
-
Create a ClusterQueue object as a YAML file:
namespaceSelector: Defines which namespaces can use the resources governed by this cluster queue. An emptynamespaceSelectoras shown in the example means that all namespaces can use these resources.coveredResourcesofdefault-flavor: Defines the resource types governed by the cluster queue. This exampleClusterQueueobject governs CPU, memory, pod, and GPU resources.flavors.nameofdefault-flavor: Defines the resource flavor that is applied to the resource types listed. In this example, thedefault-flavorresource flavor is applied to CPU, memory, pod, and GPU resources.resourcesofdefault-flavor: Defines the resource requirements for admitting jobs. This example cluster queue only admits jobs if the following conditions are met:- The sum of the CPU requests is less than or equal to 9.
- The sum of the memory requests is less than or equal to 36Gi.
- The total number of pods is less than or equal to 5.
- The sum of the GPU tasks is less than or equal to 20 If you use the Alauda Build of Hami(Refer callout 5).
- The sum of the total GPU cores requests is less than or equal to 300 If you use the Alauda Build of Hami.
- The sum of the total GPU memory requests is less than or equal to 20480.
- The sum of the GPU requests is less than or equal to 100 If you use the Alauda Build of NVIDIA GPU Device Plugin.(Refer callout 7).
coveredResourcesoft4-flavor: Defines the resources requirements for Alauda Build of Hami. If you do not use the Alauda Build of Hami, please delete it.flavors.nameoft4-flavor: Defines the resource flavor that is applied to the resource types listed. In this example, thet4-flavorresource flavor is applied to Nvidia T4 GPU cards. If you don't want to configure quotas for specific card types, you can fill indefault-flavor.coveredResourcesofa30-flavor: Defines the resources requirements for Alauda Build of NVIDIA GPU Device Plugin. If you do not use the Alauda Build of NVIDIA GPU Device Plugin, please delete it.flavors.nameofa30-flavor: Defines the resource flavor that is applied to the resource types listed. In this example, thea30-flavorresource flavor is applied to Nvidia A30 GPU cards. If you don't want to configure quotas for specific card types, you can fill indefault-flavor.
-
Apply the
ClusterQueueobject by running the following command:
2. Configuring a resource flavor
After you have configured a ClusterQueue object, you can configure a ResourceFlavor object.
Resources in a cluster are typically not homogeneous. If the resources in your cluster are homogeneous, you can use an empty ResourceFlavor instead of adding labels to custom resource flavors.
You can use a custom ResourceFlavor object to represent different resource variations that are associated with cluster nodes through labels, taints, and tolerations. You can then associate workloads with specific node types to enable fine-grained resource management.
2.1. Prerequisites
- The Alauda Container Platform Web CLI has communication with your cluster.
- You have cluster administrator permissions or the
kueue-batch-admin-rolerole.
2.2. Procedure
-
Create a ResourceFlavor object as a YAML file:
Example of an empty ResourceFlavor object
Example of a custom ResourceFlavor object for Nvidia Tesla T4 GPU
Example of a custom ResourceFlavor object for Nvidia A30 GPU
-
Apply the ResourceFlavor object by running the following command:
3. Configuring a local queue
A local queue is a namespaced object, represented by a LocalQueue object, that groups closely related workloads that belong to a single namespace.
As an administrator, you can configure a LocalQueue object to point to a cluster queue. This allocates resources from the cluster queue to workloads in the namespace specified in the LocalQueue object.
3.1. Prerequisites
- The Alauda Container Platform Web CLI has communication with your cluster.
- You have cluster administrator permissions or the
kueue-batch-admin-rolerole. - You have created a
ClusterQueueobject.
3.2. Procedure
-
Create a LocalQueue object as a YAML file:
Example of a basic LocalQueue object
-
Apply the LocalQueue object by running the following command:
4. Configuring a default local queue
As a cluster administrator, you can improve quota enforcement in your cluster by managing all jobs in selected namespaces without needing to explicitly label each job. You can do this by creating a default local queue.
A default local queue serves as the local queue for newly created jobs that do not have the kueue.x-k8s.io/queue-name label. After you create a default local queue, any new jobs created in the namespace without a kueue.x-k8s.io/queue-name label automatically update to have the kueue.x-k8s.io/queue-name: default label.
4.1. Prerequisites
- The Alauda Container Platform Web CLI has communication with your cluster.
- You have cluster administrator permissions or the
kueue-batch-admin-rolerole. - You have created a ClusterQueue object.
4.2. Procedure
-
Create a LocalQueue object named default as a YAML file:
Example of a default LocalQueue object
-
Apply the
LocalQueueobject by running the following command:
4.3. Verification
- Create a job in the same namespace as the default local queue.
- Observe that the job updates with the
kueue.x-k8s.io/queue-name: defaultlabel.