Site Reliability Engineer
At Pretius, we are looking for an Site Reliability Engineer (SRE) to a project in the financial industry, international team.
Project / Role
Own reliability of AI applications and pipelines end-to-end
Build and run a central “control tower” (monitoring, alerting, KPIs)
Design actionable telemetry (latency, throughput, failures, bottlenecks)
Lead incident response: triage, coordination, RCA, post-mortems
Reduce recurring incidents through automation and engineering fixes
Improve CI/CD quality: testing, release stability, reliability gates
Partner with engineering teams to make systems measurably more stable
Stack: Azure DevOps, Kubernetes, Datadog, Azure, CI/CD, Grafana
Requirements
5+ years as SRE / Production / Platform Engineer
Strong hands-on production experience
Proven incident management & RCA
Ability to build practical monitoring that works in reality
Proactive, ownership mindset
Comfortable in dynamic environments
Hands-on engineer working directly with clusters, pipelines, and code
AI-native approach — actively using AI tools (LLMs, Copilot, automation) in daily work
What do we offer?
We focus on long-term relationships based on fair principles and reliability
Co-financing of the Multisport card and Medicover private healthcare
Modern office available
Team bonding activities, internal courses, conferences, certifications
Site Reliability Engineer
Site Reliability Engineer