SPEAKER 2025

Aakash Bist

Member of Technical Staff, Pure Storage

About Talk

AI Inference That Doesn’t Run Out of Memory

Running large language models (LLMs) can quickly eat up GPU memory, especially as inputs get longer and more users run models at once. This often slows things down or makes inference fail. In this talk, we’ll show how we use an open-source-first approach to offload memory to fast, shared storage from Pure Storage (FlashBlade), helping AI workloads run more efficiently and cost-effectively without running out of memory.

TRACK: Al and ML

5th Nov 2025 | HALL C | Time: 03:30-04:00

About Speaker

Aakash is a software engineer at Pure Storage, where he focuses on building AI systems and agentic workflows. His work centers on scalable LLM inference, leveraging Pure’s advanced storage technologies to deliver performant and cost-effective model serving solutions for enterprise applications.

Driven by a passion for innovation, Aakash applies his expertise in AI and infrastructure to design efficient, future-ready systems. His contributions reflect a commitment to advancing intelligent workflows that combine scalability, performance, and reliability.