Improve Price Performance for LLM Serving with vLLM on TPU & GKE | Kisaco Research

Dive into a hands-on workshop designed exclusively for AI developers. Learn to leverage the power of Google Cloud TPUs, the custom accelerators behind Google Gemini, for highly efficient LLM inference using vLLM. In this workshop, you will build and deploy Gemma 3 27B on Trillium TPUs with vLLM and Google Kubernetes Engine (GKE). Explore advanced tooling like Dynamic Workload Scheduler (DWS) for TPU provisioning, Google Cloud Storage (GCS) for model checkpoints, and essential observability and monitoring solutions

Location: Room 207

Duration: 1 hour

Sponsor(s): 
Google
Speaker(s): 

Author:

Niranjan Hira

Senior Product Manager
Google Cloud

As a Product Manager in our AI Infrastructure team, Hira looks out for how Google Cloud offerings can help customers and partners build more helpful AI experiences for users.  With over 30 years of experience building applications and products across multiple industries, he likes to hog the whiteboard and tell developer tales.

Niranjan Hira

Senior Product Manager
Google Cloud

As a Product Manager in our AI Infrastructure team, Hira looks out for how Google Cloud offerings can help customers and partners build more helpful AI experiences for users.  With over 30 years of experience building applications and products across multiple industries, he likes to hog the whiteboard and tell developer tales.

Author:

Don McCasland

Developer Advocate Lead
Google Cloud

Don leads the Cloud Developer Relations team for AI Infrastructure at Google Cloud. A 20-year veteran in Developer Operations, he is focused on empowering the global developer community to build and scale the next generation of AI applications on Google's cutting-edge platforms.

Don McCasland

Developer Advocate Lead
Google Cloud

Don leads the Cloud Developer Relations team for AI Infrastructure at Google Cloud. A 20-year veteran in Developer Operations, he is focused on empowering the global developer community to build and scale the next generation of AI applications on Google's cutting-edge platforms.

Session Type: 
Workshop