/ Written by practitioners

Deep dives into the tools you run in production.

Tutorials, failure post-mortems, and toolchain breakdowns — each article references specific versions, real configs, and the failure modes worth knowing about.

— Featured article

Kubernetes 1.30: What broke in our staging cluster and how we traced it.

Instructor: Ravi Nair · Senior SRE · 9 min read

A walk through a real node-pressure eviction cascade triggered by the new sidecar container graduation. Includes the kubectl debug commands and the admission webhook patch that resolved it.

Tags: Kubernetes 1.30 · containerd · admission webhooks · node pressure

Tight overhead crop of a laptop keyboard with a Terraform plan output visible on screen, cool daylight from the left casting sharp shadows across the keys, high contrast
Tight overhead crop of a laptop keyboard with a Terraform plan output visible on screen, cool daylight from the left casting sharp shadows across the keys, high contrast
Close-up of a monitoring dashboard on a wide curved monitor, alert lines highlighted in the center, cool fluorescent office light from above, hands resting on keyboard in foreground, shallow depth of field
Close-up of a monitoring dashboard on a wide curved monitor, alert lines highlighted in the center, cool fluorescent office light from above, hands resting on keyboard in foreground, shallow depth of field
Wide shot of a whiteboard covered in cloud architecture diagram lines and AWS service labels, marker in hand partially visible at the right edge, cool even studio lighting, high contrast black on white
Wide shot of a whiteboard covered in cloud architecture diagram lines and AWS service labels, marker in hand partially visible at the right edge, cool even studio lighting, high contrast black on white
Recent articles

Current toolchain. Specific configs. No filler.

Terraform · AWS provider
Prometheus · Grafana
AWS · Cost optimization

Pinning AWS provider versions without breaking modules.

Alert fatigue: cutting noise with recording rules.

EC2 Savings Plans vs Reserved Instances in 2024.

How to pre-compute cardinality-heavy queries so your on-call rotation stops seeing false positives at 2 a.m. Runnable PromQL and a recording rule YAML you can drop in.

The pricing model changed again. Here is what the new compute commitment terms mean for mixed-workload accounts and when Spot still beats both options outright.

Version constraint syntax that actually holds across a monorepo. Includes a .terraform.lock.hcl workflow and the one edge case where required_providers silently loses.

Priya Anand · 7 min read

Kiran Desai · 9 min read

Ravi Nair · 6 min read

+ Stay current

New articles when the toolchain changes.

No weekly digest filler. We send a note when a significant release, provider update, or production pattern is worth documenting — roughly twice a month.