← All posts

gpu

2 posts tagged “gpu

Self-Hosting LLMs on Kubernetes: A Practical Guide

How to deploy, serve, and autoscale open-source large language models on Kubernetes with vLLM — from GPU node pools and deployment manifests to KEDA-based autoscaling and production guardrails.

GPU Cost Optimization on Kubernetes: A Practical Guide

Learn how to reduce GPU infrastructure costs by up to 60% with proper Kubernetes scheduling, time-slicing, and right-sizing strategies.