SPEAKER 2025

Aditya Deshpande
Software Engineer 2, Dell Technologies
About Talk
Stateless Slurm Cluster Deployment and Telemetry Monitoring Using Omnia
In modern high-performance computing (HPC) environments, agility, scalability, and observability are essential. Deploying a stateless HPC cluster with Slurm as the cluster manager using Omnia – an open-source HPC deployment toolkit from Dell Technologies enables rapid and repeatable cluster provisioning through Ansible-based automation. This approach eliminates persistent state dependencies, making infrastructure setup more flexible and efficient.
Telemetry monitoring can be seamlessly integrated to provide real-time insights into system performance and resource utilization using open-source tools. This combination simplifies HPC infrastructure management while enhancing operational visibility and performance optimization.
Key Takeaway:
Omnia empowers users to deploy scalable, stateless Slurm clusters and implement robust telemetry monitoring, streamlining HPC operations through open-source automation.
TRACK: IT Infra
6th Nov 2025 | HALL C | Time: 05:00-05:45
About Speaker
Aditya Deshpande is a Software Engineer 2 at Dell Technologies with expertise in building scalable infrastructure and streamlining operations using Linux and open-source technologies. He has worked on automation, telemetry, and security design as part of the Omnia project, contributing to cluster provisioning, workload orchestration, and observability frameworks. Aditya is proficient in Kubernetes deployment and orchestration, and has hands-on experience with technologies like Slurm, Prometheus, Kafka, and OpenLDAP.