Observability for LLM Applications on Kubernetes: Tokens, Traces, and Cost per Request
How to instrument self-hosted and hybrid LLM workloads with OpenTelemetry, Prometheus, and Langfuse — tracking time-to-first-token, tokens per second, GPU utilization, and unit economics down to the individual request.