Job Overview
The Staff Engineer - SRE will be the technical leader of the global Site Reliability Engineering (SRE) team, driving the vision, strategy, and execution plan for the function. This role is critical in defining and implementing best practices for system reliability, scalability, and performance across the technical organization.
As a key member of the engineering leadership team, the Staff Engineer will work closely with Infrastructure, Engineering, and Product teams to develop highly resilient, observable, and automated solutions that enhance system availability and efficiency. The ideal candidate will bring deep technical expertise, strong problem-solving skills, and a passion for reliability engineering.Job Description and Requirements
Job Responsibilities
- Participation in defining and leading the SRE vision and strategy, ensuring alignment with business objectives and engineering priorities.
- Architect, implement, and advocate for best-in-class reliability, observability, and scalability practices across the platform.
- Lead and execute resilience strategy across engineering teams by defining best practices for building fault-tolerant, highly available systems, and ensuring robust validation through multiple forms of resilience testing.
- Build and scale automated failure detection and remediation mechanisms including AI/ML powered solutions.
- Participate in improving Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets to enhance system reliability.
- Support root cause analysis (RCA) investigations, drive corrective actions, and advocate for a blameless postmortem culture.
- Influence and mentor engineering teams on SRE principles, DevOps culture, and best practices.
- Stay ahead of industry trends, adopting new tools, frameworks, and methodologies to continually improve system reliability.
Preferred Qualifications
- 8+ years of experience in software engineering or site reliability engineering.
- 4+ years of experience in designing and operating distributed systems at scale, with in-depth knowledge of failure modes, cascading failures, and mitigation patterns.
-
Proven expertise in designing and operating large-scale distributed systems in Azure or other public cloud.
- Strong programming skills in languages such as Python, Go, Java or .Net.
- In-depth technical understanding and experience with at least two of the following DevOps platforms: GitHub, Azure DevOps, GitLab, or Jenkins.
- Hands-on experience with observability tools (e.g., Prometheus, Grafana, OpenTelemetry, Datadog, or New Relic).
- Hands on experience with AIOps tools, (e.g. Big Panda, Keep, Dynatrace) would be an aset.
- Hands-on experience with modern chaos testing tools.
- Experience working in global, high-availability SaaS environments.
- Proficient in conducting and communicating evaluation and selection processes.
- Experience implementing redundancy and disaster recovery scenarios.
- Excellent teamwork and cross-group collaboration skills.
- Ability to collaborate with both technical and business professionals.
- Hands-on experience with Agile Project Development Methodologies.
- Experience delivering complex technical solutions.
- Excellent problem-solving, analytical, and communication skills.
- Previous experience in leading or mentoring engineers in a reliability-focused capacity.
Competencies and Skills
- Technical Leadership – Ability to set technical direction and drive cross-functional collaboration.
- Systems Thinking – Strong grasp of distributed systems, networking, and cloud architectures.
- Automation-First Mindset – Commitment to reducing toil through scripting and automation.
- Reliability Engineering – Expertise in SLOs, SLIs, error budgets, and high-availability architectures.
- Incident Management & Postmortems – Experience in handling production incidents and driving continuous improvement.
- Observability & Monitoring – Deep understanding of logging, monitoring, and alerting best practices.
- Practical knowledge of data structures and modern data engines.
- Collaboration & Communication – Ability to work across teams, influence stakeholders, and advocate for reliability improvements.
- Mentorship & Coaching – Passion for mentoring engineers and building an SRE culture within the organization.
Relativity is committed to competitive, fair, and equitable compensation practices.
This position is eligible for total compensation which includes a competitive base salary, an annual performance bonus, and long-term incentives.
The expected salary range for this role is between following values:
300 000 and 450 000PLNThe final offered salary will be based on several factors, including but not limited to the candidate's depth of experience, skill set, qualifications, and internal pay equity. Hiring at the top end of the range would not be typical, to allow for future meaningful salary growth in this position.