SPEAKER 2025

Jayapaul Paul

MTS- LS/1, Pure Storage

About Talk

AI Inference That Doesn’t Run Out of Memory

Running large language models (LLMs) can quickly eat up GPU memory, especially as inputs get longer and more users run models at once. This often slows things down or makes inference fail. In this talk, we’ll show how we use an open-source-first approach to offload memory to fast, shared storage from Pure Storage (FlashBlade), helping AI workloads run more efficiently and cost-effectively without running out of memory.

TRACK: AI and ML

6th Nov 2025 | HALL C | Time: 03:30-04:00

About Speaker

Jayapaul Paul is the Lead Architect at Pure Storage’s India headquarters in Bangalore, where he focuses on advanced technical areas including GPU Direct Storage, NFS-over-RDMA, KV Cache, Multitenancy, and Observability. With a strong foundation in systems engineering and performance optimization, he plays a key role in driving innovation and architectural excellence within Pure Storage’s engineering organization.

Prior to joining Pure Storage, Jayapaul held senior software engineering roles at NXP Semiconductors, EMC, and Nutanix, where he contributed to large-scale infrastructure and storage solutions. He holds a Master’s degree in Computer Science from BITS Pilani and brings over a decade of hands-on experience in developing next-generation enterprise technologies