Skip to the content.

Projects

Selected infrastructure and platform engineering work.


### AWS Cloud Migration **Problem:** Legacy on-premises applications had high operational overhead, limited scalability, and no disaster recovery capability. **Solution:** Led end-to-end migration to AWS. Designed target architecture, provisioned infrastructure from scratch using Terraform and CloudFormation, and implemented disaster recovery across multiple availability zones. **Outcome:** Eliminated on-premises hardware costs, achieved 99.9% uptime SLA, and reduced mean time to recovery from hours to minutes. AWS Terraform CloudFormation Disaster Recovery

### Terraform Module Library **Problem:** Infrastructure was provisioned inconsistently across teams, leading to security drift, cost overruns, and unreviewed configurations. **Solution:** Authored a library of reusable Terraform modules enforcing organizational security standards, tagging policies, and cost guardrails. Modules cover VPCs, EKS clusters, RDS instances, S3 buckets, and IAM roles. **Outcome:** Reduced infrastructure provisioning time by 60% and eliminated security review back-and-forth for compliant resource types. Terraform AWS IaC Security

### Backstage Internal Developer Platform **Problem:** Developers waited days for platform team involvement to create new services and provision underlying infrastructure. **Solution:** Architected and deployed an internal developer platform using Backstage, including software templates, a service catalog, and integrated Terraform-backed infrastructure provisioning. **Outcome:** Engineering teams self-provision new services in under 10 minutes. Platform team toil reduced by ~40%. Backstage Terraform AWS IDP

### Full-Stack Observability — Datadog **Problem:** Engineering teams had limited visibility into production system health, leading to slow incident response and reactive firefighting. **Solution:** Designed and rolled out Datadog across the organization covering APM, RUM, distributed tracing, log management, and synthetic testing. Built dashboards and SLO monitors for all critical services. **Outcome:** Mean time to detect (MTTD) reduced by 70%. On-call engineers now have full context within seconds of an alert firing. Datadog APM Prometheus Grafana

### GitHub Actions Organization Framework **Problem:** Each repository had bespoke, unmaintained CI/CD pipelines with inconsistent security scanning, testing, and deployment patterns. **Solution:** Designed a library of reusable GitHub Actions workflows covering build, test, security scan, and deploy stages. Migrated all repositories to the shared framework with repo-specific overrides. **Outcome:** Org-wide pipeline maintenance consolidated to a single repo. New repositories get compliant CI/CD in under 5 minutes. GitHub Actions CI/CD GitLab CI Bitbucket

### EKS & ECS Cluster Operations **Problem:** Container workloads ran on ad-hoc clusters with no standardized networking, autoscaling, or lifecycle management. **Solution:** Standardized cluster provisioning for EKS (Kubernetes) and ECS using Terraform modules. Implemented cluster autoscaler, Karpenter node provisioning, and service mesh for traffic management. **Outcome:** Infrastructure costs reduced by 30% through right-sizing. Zero unplanned cluster downtime over 12 months. EKS ECS Kubernetes Karpenter

### AI Tooling — Claude Code Skills Registry & Agents **Problem:** AI-assisted development workflows were inconsistent and undocumented, leading to duplicated effort and poor quality outputs. **Solution:** Built a centralized Claude Code skills registry with reusable hooks and prompt patterns. Developed autonomous agents integrated with AWS Bedrock Knowledge Bases and AgentCore services for infrastructure automation tasks. **Outcome:** Engineering teams adopt AI tooling via a governed, versioned registry. Infrastructure automation tasks that previously took 2 hours now complete in under 5 minutes. Claude Code AWS Bedrock AgentCore AI Agents