SPEAKER 2025

Aakash Bist
Member of Technical Staff, Pure Storage
About Talk
AI Inference That Doesn’t Run Out of Memory
Running large language models (LLMs) can quickly eat up GPU memory, especially as inputs get longer and more users run models at once. This often slows things down or makes inference fail. In this talk, we’ll show how we use an open-source-first approach to offload memory to fast, shared storage from Pure Storage (FlashBlade), helping AI workloads run more efficiently and cost-effectively without running out of memory.
TRACK: Al and ML
5th Nov 2025 | HALL C | Time: 03:30-04:00
About Speaker
Aakash is a software engineer at Pure Storage, where he focuses on building AI systems and agentic workflows. His work centers on scalable LLM inference, leveraging Pure’s advanced storage technologies to deliver performant and cost-effective model serving solutions for enterprise applications.
Driven by a passion for innovation, Aakash applies his expertise in AI and infrastructure to design efficient, future-ready systems. His contributions reflect a commitment to advancing intelligent workflows that combine scalability, performance, and reliability.