Skip to content

SPEAKER 2025

Abhishek S A

Software Principal Engineer, Dell Technologies

About Talk

Stateless Slurm Cluster Deployment and Telemetry Monitoring Using Omnia

In modern high-performance computing (HPC) environments, agility, scalability, and observability are essential. Deploying a stateless HPC cluster with Slurm as the cluster manager using Omnia – an open-source HPC deployment toolkit from Dell Technologies enables rapid and repeatable cluster provisioning through Ansible-based automation. This approach eliminates persistent state dependencies, making infrastructure setup more flexible and efficient.
Telemetry monitoring can be seamlessly integrated to provide real-time insights into system performance and resource utilization using open-source tools. This combination simplifies HPC infrastructure management while enhancing operational visibility and performance optimization.
Key Takeaway:
Omnia empowers users to deploy scalable, stateless Slurm clusters and implement robust telemetry monitoring, streamlining HPC operations through open-source automation.

TRACK: IT Infra

6th Nov 2025 | HALL C | Time: 05:00-05:45

About Speaker

Abhishek S A is a Software Principal Engineer at Dell Technologies, with deep expertise in DevOps and a strong focus on automating server deployment and configuration. His technical proficiency spans across Ansible, AI, High-Performance Computing (HPC), Docker, Kubernetes, Linux, and Python. Known for his remarkable patience and boundless energy, Abhishek is also a passionate music enthusiast, bringing creativity and rhythm into both his professional and personal pursuits.