Building a Hybrid LLM Platform on EKS, Part 7: Observability and Cost Telemetry
Part 7 of our hands-on EKS series. We instrument the TypeScript router with OpenTelemetry, upgrade Prometheus to kube-prometheus-stack for GPU and vLLM metrics, add Grafana Tempo for distributed traces, and wire Langfuse so every request shows its backend, token count, and dollar cost.
Observability for LLM Applications on Kubernetes: Tokens, Traces, and Cost per Request
How to instrument self-hosted and hybrid LLM workloads with OpenTelemetry, Prometheus, and Langfuse — tracking time-to-first-token, tokens per second, GPU utilization, and unit economics down to the individual request.