Key features
Slurm Clusters eliminate the traditional complexity of cluster orchestration by providing:- Zero configuration setup: Slurm and munge are pre-installed and fully configured.
 - Instant provisioning: Clusters deploy rapidly with minimal setup.
 - Automatic role assignment: Runpod automatically designates controller and agent nodes.
 - Built-in optimizations: Pre-configured for optimal NCCL performance.
 - Full Slurm compatibility: All standard Slurm commands work out-of-the-box.
 
If you prefer to manually configure your Slurm deployment, see Deploy an Instant Cluster with Slurm (unmanaged) for a step-by-step guide.
Deploy a Slurm Cluster
- Open the Instant Clusters page on the Runpod console.
 - Click Create Cluster.
 - Select Slurm Cluster from the cluster type dropdown menu.
 - Configure your cluster specifications:
- Cluster name: Enter a descriptive name for your cluster.
 - Pod count: Choose the number of Pods in your cluster.
 - GPU type: Select your preferred GPU type.
 - Region: Choose your deployment region.
 - Network volume (optional): Add a network volume for persistent/shared storage. If using a network volume, ensure the region matches your cluster region.
 - Pod template: Select a Pod template or click Edit Template to customize start commands, environment variables, ports, or container/volume disk capacity.
Slurm Clusters currently only support official Runpod Pytorch images. If you deploy using a different image, the Slurm process will not start.
 
 - Click Deploy Cluster.
 
Connect to a Slurm Cluster
Once deployment completes, you can access your cluster from the Instant Clusters page. From this page you can select a cluster to view it’s component nodes, including a label indicating the Slurm controller (primary node) and Slurm agents (secondary nodes). Expand a node to view details like availability, GPU/storage utilization, and options for connection and management. Connect to a node using the Connect button, or using any of the connection methods supported by Pods.Submit and manage jobs
All standard Slurm commands are available without configuration. For example, you can: Check cluster status and available resources:Advanced configuration
While Runpod’s Slurm Clusters work out-of-the-box, you can customize your configuration by connecting to the Slurm controller node using the web terminal or SSH. Access Slurm configuration files in their standard locations:/etc/slurm/slurm.conf- Main configuration file./etc/slurm/gres.conf- Generic resource configuration.
Troubleshooting
If you encounter issues with your Slurm Cluster, try the following:- Jobs stuck in pending state: Check resource availability with 
sinfoand ensure requested resources are available. If you need more resources, you can add more nodes to your cluster. - Authentication errors: Munge is pre-configured, but if issues arise, verify the munge service is running on all nodes.