IT Infrastructure and DevOps Lead
Special needs require special people.
About us
We run a hybrid stack across cloud and on-prem: Kubernetes, containers, and a handful of performance-critical services on bare metal. We’re looking for a hands-on DevOps/Platform Lead to own the developer platform and deployment experience—building scalable, reliable, and automated infrastructure that accelerates product teams.
What you’ll do
- Lead & grow the team: Coach 3–5 DevOps/Platform engineers; set standards, run 1:1s, drive roadmaps and delivery.
- Own the platform: Design, operate, and evolve Kubernetes-based environments (multi-cluster, multi-region) across cloud + on-prem.
- IaC & GitOps: Standardize with Terraform/Helm and GitOps (Argo CD/Flux). Create reusable modules, blueprints, and golden paths.
- CI/CD at scale: Build fast, reliable pipelines (Build/Test/Deploy), artifact registries, environment promotions, and preview environments
- Observability: Ensure first-class monitoring, logging, and tracing (Prometheus/Grafana/ELK/OTel); tighten feedback loops for engineers.
- Networking for hybrid: Own service connectivity—ingress, LBs, CNI, east-west traffic, API gateways, and secure cloud↔on-prem peering.
- Stateful & storage: Operate CSI-backed storage, object/block integrations, and tune performance for stateful workloads where needed.
- Performance & scalability: Capacity planning, autoscaling strategies (HPA/VPA/KEDA), rollout strategies (blue/green, canary).
- Developer experience: Ship internal self-service (IDP) portals, templates, and CLIs so teams can provision infra safely and quickly.
- Tooling & modernization: Evaluate and introduce tools that improve reliability, speed, or cost—measure impact and adopt pragmatically.
Must-have experience
- 7–10 years in DevOps/Platform/SRE roles
- Deep hands-on with Kubernetes (cluster lifecycle, upgrades, multi-cluster patterns) and containers.
- Strong IaC (Terraform) and Helm; production GitOps workflows (Argo CD/Flux).
- Cloud (AWS/GCP/Azure) plus real exposure to on-prem/bare metal or virtualization (KVM/Proxmox/VMware).
- Solid networking fundamentals (VPCs/VNETs, VPNs/peering, DNS, L4/L7 load balancing, ingress).
- CI/CD design and operation (GitLab CI/GitHub Actions/Jenkins or similar); caching, parallelization, test orchestration.
- Observability stacks (Prometheus/Grafana, ELK/EFK, OpenTelemetry) and performance troubleshooting.
- Proficient in automation/scripting (Python or Go preferred; Bash a given). Git-centric workflows.
Nice to have
- Experience with cursor or similar AI tools.
- 2+ years leading a small engineering team.
- Multi-cluster management (Cluster API, Rancher, GKE Autopilot/EKS blueprints).
- Service mesh experience (Istio/Linkerd) and traffic management for canary/blue-green.
- Cost awareness for infra (right-sizing, autoscaling, spot/RI/Savings Plans).
- Supply-chain hardening knowledge (SBOMs, provenance) from a platform perspective.
- Experience building an Internal Developer Platform (Backstage/Port or homegrown).
Soft skills
- Pragmatic leadership: balances vision with hands-on delivery.
- Excellent communicator with product/dev teams.
- Bias for automation, simplification, and measurable outcomes
How to apply
Send your CV/LinkedIn and a short note about a messy infra you stabilized (what you did, impact) to HR.