Tech Lead SRE / Principal SRE Engineer
We are looking for a Tech Lead SRE / Principal SRE Engineer to join a team working directly on a proprietary, business-critical product in a fast-paced and dynamic environment. This is a hands-on role where you will have a real impact on system reliability, scalability, and technical direction.
The role
As a Technical SRE Lead / Principal Site Reliability Engineer, you will design, implement, and operate highly available and scalable systems built primarily on Kubernetes (AWS EKS). You will play a key role in setting technical standards, guiding engineers, and ensuring operational excellence across production environments.
You will work extensively with Terraform, ArgoCD, and GitHub Actions, applying GitOps principles and modern deployment strategies such as blue-green, canary releases, and feature flagging. The role requires strong troubleshooting skills, a deep understanding of distributed systems, and active participation in production support when needed.
Main responsibilities
Design, operate, and troubleshoot Kubernetes clusters (AWS EKS) with a focus on networking, scalability, security, and reliability
Architect and maintain highly available, fault-tolerant infrastructure on AWS using Infrastructure as Code (Terraform)
Automate provisioning, deployment, and configuration processes following GitOps practices with ArgoCD and GitHub Actions
Define and enforce guardrails for infrastructure, applications, and databases to ensure secure and consistent operations
Implement and maintain monitoring and observability solutions using Prometheus, Grafana, and related tools
Build and evolve CI/CD pipelines and progressive delivery strategies
Collaborate closely with development teams to embed reliability and security best practices throughout the application lifecycle
Participate in incident response, post-incident reviews, and continuous improvement initiatives, including resilience testing and chaos engineering
Design and manage secure networking solutions, including AWS VPCs, Kubernetes networking, and firewalls
What we are looking for
Required qualifications
9+ years of commercial experience in SRE, systems engineering, infrastructure, or related roles
At least 2 years of experience in a Tech Lead or similar leadership position
University degree in Computer Science or a related field
Strong hands-on experience with Kubernetes (AWS EKS or similar), including networking, scaling, and security
Advanced knowledge of AWS services such as EKS, EC2, CloudWatch, Route53, Aurora, and S3
Proven experience with Terraform, ArgoCD, and GitHub Actions
Solid background in monitoring, observability, and incident management (Prometheus, Grafana)
Strong scripting and automation skills in Python, Go, or Bash
Availability to work standard hours 09:00–17:00 CET
Willingness to actively participate in production support activities when required
Nice to have
Experience with other cloud platforms such as GCP or Azure
Familiarity with logging and observability stacks like ELK, Loki, or Graylog
Experience with chaos engineering and resilience testing
Knowledge of secrets management tools such as HashiCorp Vault or SOPS
Experience working with databases, including setup, scaling, and optimisation
Strong communication, mentoring, and coaching skills
Tech Lead SRE / Principal SRE Engineer
Tech Lead SRE / Principal SRE Engineer