Senior Site Reliability Engineer

DevOps

Senior Site Reliability Engineer

DevOps
Towarowa 28, Warszawa

DCG

Full-time
B2B
Senior
Remote

Job description

As a recruitment company, DCG understands that every business is powered by experienced professionals. Our management style and partnership approach enable us to meet your needs and provide continuous support. Due to our ongoing growth and the large number of recruitment projects we undertake for our partners, we are currently looking for: Senior Site Reliability Engineer

Responsibilities:

  • Building and maintaining a central operational "control tower" for AI applications and pipelines

  • Designing and implementing monitoring, alerts, and dashboards (signals, thresholds, routing, runbooks)

  • Incident response: triage, coordination, root cause analysis, post-mortems, and preventive measures

  • Standardization of pipeline telemetry (success/failure, latency, throughput, bottlenecks)

  • CI/CD optimization – release quality, automated testing, reliability gates

  • Collaboration with engineering teams to reduce the number of recurring incidents

 

Requirements:

  • Proactive and self-driven – identifies problems, risks, and opportunities for improvement on their own; doesn't wait for detailed instructions

  • Engaged owner mindset – treats system stability as their end-to-end responsibility

  • Hands-on engineer – regularly works with clusters, pipelines, monitoring, and code

  • AI-native – uses AI tools extensively on a daily basis (Copilot, LLMs, automation, analytics, debugging, documentation) and understands how AI impacts system design and maintenance

  • Comfortable working in a dynamic environment with processes that are not yet fully mature

  • Experience with Azure DevOps (Boards, Repos, Pipelines)

  • Strong knowledge of Kubernetes, including troubleshooting, scaling, and production operations

  • Proficiency in Datadog (metrics, logs, dashboards, alerting)

  • Experience with Azure Portal for environment operations and configuration

  • Strong knowledge of CI/CD practices, including pipeline optimization, testing, and quality gates

  • 5+ years of experience as an SRE / Production / Platform Engineer

  • Proven experience in production environments

  • Strong knowledge of incident management and root cause analysis (RCA)

  • Ability to build practical, rather than theoretical, monitoring systems

  • Very good command of English, both spoken and written

Nice to have:

  • Experience with Grafana

  • Experience with AI/LLM pipelines and their observability

  • Building multi-app monitoring platforms

  • Working in scaled Kubernetes environments (AKS or similar)

 

Offer:

  • Private medical care

  • Co-financing for the sports card

  • Training & learning opportunities

  • Constant support of dedicated consultant

  • Employee referral program

Tech stack

    English

    B2

    Datadog

    advanced

    Kubernetes

    advanced

    Grafana

    advanced

    CI/CD

    advanced

    AI

    advanced

    Copilot

    regular

Office location

About the company

DCG

DCG to przestrzeń, w której spotykają się potrzeby biznesu i ambicje ludzi. Znamy wartość dobrze dopasowanej współpracy, dlatego pomagamy kandydatom znaleźć środowisko, w którym będą mogli rozwinąć skrzydła, a firmom - z...

Company profile

Senior Site Reliability Engineer

Summary of the offer

Senior Site Reliability Engineer

Towarowa 28, Warszawa
DCG
By applying, I consent to the processing of my personal data for the purpose of conducting the recruitment process. Informujemy, że administratorem danych jest DCG Sp. z o.o., ul. Towarowa 28, 00-839 Warszawa (dalej jako "administrator"). Masz prawo ... MoreThis site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.