Senior Site Reliability Engineer
Our client is a product-driven technology company that places engineering excellence at the core of everything they build. Quality, long-term thinking, and technical integrity are fundamental to their approach.
The organization operates with a strong ownership culture — engineers are trusted with real responsibility and meaningful influence over technical direction. Hiring standards are intentionally high, with a clear focus on experienced professionals who value accountability, autonomy, and delivering work at the highest level.
We are looking for an experienced Senior Site Reliability Engineer (SRE) to strengthen their infrastructure and platform engineering capabilities.
This is a key role for someone who thrives in cloud-native environments and has hands-on experience operating and scaling production systems. In this position, you will contribute to building a strong engineering culture centered around automation, observability, and continuous optimization. You will also play an important role in advancing platform engineering initiatives, enabling development teams to ship reliable, production-ready software efficiently and safely.
Key Responsibilities
Security & Reliability Engineering
Embed security principles throughout the infrastructure and service lifecycle — from architectural design to deployment and incident handling. Contribute to security assessments, threat modeling sessions, and ongoing improvements to operational security standards.
Infrastructure Design & Automation
Architect, implement, and maintain highly automated infrastructure and delivery pipelines using Infrastructure as Code practices. Drive initiatives that improve system resilience, including self-healing mechanisms and preventative reliability strategies.
Incident Leadership
Take an active role in managing production incidents, coordinating response efforts, and facilitating blameless post-incident reviews. Translate learnings into actionable improvements to reduce recurrence and improve system robustness.
On-Call Participation
Be part of a structured on-call rotation, responding to and resolving production-related issues in a timely and professional manner.
Observability & Performance Optimization
Design and enhance monitoring, logging, and tracing frameworks to ensure full visibility into system health. Continuously refine alerting strategies and response procedures to proactively detect and mitigate performance or reliability risks.
Cross-Team Collaboration
Work closely with engineering, product, and security stakeholders to align on reliability standards, platform improvements, and operational best practices.
Cloud Efficiency & Cost Awareness
Analyze cloud resource consumption and introduce improvements that enhance efficiency while maintaining performance and reliability standards.
Operational Support
Provide technical support for internal stakeholders and contribute to higher-level support for external users when needed, ensuring service continuity and quality.
Access & Permissions Governance
Oversee service-level access management, ensuring appropriate permissions, policy enforcement, and secure user access controls.
Required Background & Technical Expertise
Professional experience:Minimum 5 years of hands-on experience in Site Reliability Engineering, Platform Engineering, or a closely related infrastructure-focused role.
Kubernetes expertise:Strong practical experience designing, operating, and scaling Kubernetes environments in production. Familiarity with managed Kubernetes offerings such as EKS, AKS, or GKE is highly valued.
Infrastructure as Code:Solid experience implementing infrastructure using IaC principles. Terraform is the preferred toolset; experience with alternatives like CloudFormation is also relevant.
Cloud environments:Proven track record working extensively with at least one leading cloud platform. AWS is preferred, though Azure or Google Cloud Platform experience is equally considered.
Monitoring & observability:Practical experience building and maintaining monitoring and logging ecosystems, including tools such as Prometheus, Grafana, Elasticsearch-based stacks, or comparable observability platforms.
Automation & development skills:Ability to write and maintain automation scripts using languages like Bash or Python. Experience with Go is considered a strong advantage.
CI/CD practices:Experience designing, implementing, and maintaining modern CI/CD workflows (e.g., GitHub Actions, GitLab CI or similar tooling).
NoSQL technologies:Experience deploying, configuring, and operating NoSQL database systems in production environments.
What We Offer:
Attractive financial conditions, aligned with your skills.
25 days of paid time off.
Paid maternity/paternity leave (3 months).
Coverage of business trips, training, and conference costs
Payment during short-term illness (a few days).
Senior Site Reliability Engineer
Senior Site Reliability Engineer