Senior Site Reliability Engineer

DevOps

Towarowa, Warszawa

DCG

Full-time

B2B

Senior

Remote

7 091 - 8 045 USD

Net per month - B2B

Job description

As a recruitment company, DCG understands that every business is powered by experienced professionals. Our management style and partnership approach enable us to meet your needs and provide continuous support. Due to our ongoing growth and the large number of recruitment projects we undertake for our partners, we are currently looking for:

Senior Site Reliability Engineer

Responsibilities:

Design, implement, and scale resilient infrastructure across multiple AWS accounts
Manage Kubernetes workloads with Helm, ArgoCD and Terraform — ensuring smooth, auditable deployments
Collaborate with product and platform teams to drive SRE best practices (SLIs, SLOs, error budgets)
Improve observability with Dynatrace and open-source monitoring tools
Optimize Cloudflare WAF, caching, and routing rules to ensure secure, low-latency user experiences
Automate infrastructure, deployments, and routine tasks using GitHub Actions and scripting (Python/Bash)
Lead incident response and postmortems — turning learnings into measurable improvements
Work cross-functionally in English with international teams across Europe and the US

Requirements:

5+ years of DevOps/SRE experience managing production workloads in AWS
Strong with Terraform, Helm, ArgoCD, and GitHub Actions
Deep understanding of Kubernetes (EKS preferred), including autoscaling, rollout strategies, and cluster troubleshooting
Experience with cost optimization and capacity planning
Ability to build and maintain observability pipelines (logs, metrics, traces, SLOs, error budgets)
Proven ability to design fault-tolerant systems with high availability and performance
Solid understanding of CI/CD pipelines and GitOps principles
Comfortable optimizing Cloudflare rulesets and understanding DNS, WAF, and CDN flows
Hands-on experience with monitoring and alerting tools (Dynatrace, Prometheus, Grafana, etc.)
Clear English communication and ability to collaborate with distributed, multicultural teams
Strong incident response experience: on-call participation, post-mortems, and RCA writing
Curious, pragmatic, and driven by reliability and continuous improvement

Nice to have:

Share an example where you defined or improved SLOs/SLIs that reduced alert fatigue or downtime
Contributed to automation or observability improvements through open-source or internal tooling
Ability to automate toil and reduce operational overhead
Experience leading reliability reviews or driving postmortem culture across teams
Passion for metrics, resilience engineering, and teaching SRE concepts to others

Offer:

Private medical care
Training & learning opportunities

Tech stack

English

Polish

AWS

advanced

CI/CD

advanced

GitHub

advanced

Terraform

advanced

Kubernetes

advanced

DNS

junior

WAF

junior

CDN

junior

Cloudflare

nice to have

Office location

About the company

DCG

DCG to przestrzeń, w której spotykają się potrzeby biznesu i ambicje ludzi. Znamy wartość dobrze dopasowanej współpracy, dlatego pomagamy kandydatom znaleźć środowisko, w którym będą mogli rozwinąć skrzydła, a firmom - z...

Company profile

Check similar offers