autoscaling

5 posts tagged “autoscaling”

July 19, 2026

Running a Fleet of Firecracker microVMs for eve.dev Agents, Part 2: The Host Fleet

Part 2 of the hands-on series. We put the first machines into the network from Part 1 — an Auto Scaling Group of bare-metal EC2 hosts, a launch template whose user data installs Firecracker and a host-agent daemon, the IAM role each host runs under, and the capacity model that decides how many agents a host can hold.

aws aws-cdk firecracker eve agents typescript ec2 autoscaling ai-infrastructure

July 19, 2026

Running a Fleet of Firecracker microVMs for eve.dev Agents, Part 7: Fleet Operations

The final part of the series. We make the fleet operable — event-driven host autoscaling with lifecycle-hook draining, a reconciliation loop that reschedules agents off a dead host, per-agent logs and metrics rolled up across the fleet, and the FinOps view that turns packing density into cost per agent.

aws aws-cdk firecracker eve agents typescript autoscaling observability finops ai-infrastructure

June 7, 2026

Building a Hybrid LLM Platform on EKS, Part 5: Serving Local Models with vLLM and KEDA

Part 5 of our hands-on EKS series. We deploy vLLM model servers on the GPU pool from Part 4, load Qwen2.5-7B model weights from Amazon S3 via an init container, and wire KEDA autoscaling that scales replicas with live queue depth and drives GPU nodes to zero overnight.

eks kubernetes aws-cdk vllm keda gpu autoscaling llm ai-infrastructure typescript

June 6, 2026

Building a Hybrid LLM Platform on EKS, Part 4: Platform Add-ons, the Load Balancer Controller, and Karpenter

Part 4 of our hands-on EKS series. We install the two add-ons every production EKS cluster needs: the AWS Load Balancer Controller so Kubernetes Ingress objects provision real ALBs, and Karpenter for cost-aware autoscaling — including the GPU NodePool that scales to zero between inference workloads.

eks kubernetes aws-cdk karpenter load-balancer-controller autoscaling irsa ai-infrastructure typescript

February 7, 2025

Using AI to Monitor Kubernetes Clusters and Make Dynamic Scaling Decisions

How to move beyond static thresholds and use AI-driven observability to detect anomalies, predict traffic patterns, and automate scaling decisions across your Kubernetes infrastructure.

kubernetes ai monitoring autoscaling observability