Site Reliability Engineer (SRE)

10 938 - 15 039 USDGross per month - Permanent
DevOps

Site Reliability Engineer (SRE)

DevOps
Sienna 75, Warszawa

Yard Corporate

Full-time
Permanent
Senior
Hybrid
10 938 - 15 039 USDGross per month - Permanent

Job description

About the Client

Our client is a premier, global investment management firm operating at the intersection of finance and technology. Known for their sophisticated, data-intensive systems, they build and maintain high-performance platforms that process massive volumes of market and operational data.

To support their expanding footprint, they are looking for a senior-level Site Reliability Engineer (SRE) who will take ownership of shaping, standardizing, and scaling their SRE frameworks and reliability culture from the ground up.

The Role

In this role, you will serve as a foundational force for SRE practices, partnering directly with Cloud, Infrastructure, and Software Engineering squads. You will work across a hybrid infrastructure (combining advanced AWS cloud environments and physical on-premises servers) to guarantee the scalability, resilience, and maximum uptime of critical, high-frequency transactional platforms.

Core Responsibilities

  • SRE Evangelism: Design, implement, and champion core reliability principles, helping technology teams adopt sustainable scaling practices.

  • Observability Architecture: Implement, scale, and maintain end-to-end monitoring, telemetry, and distributed tracing systems utilizing Prometheus, Grafana, Loki, and Tempo (OpenTelemetry framework).

  • Kubernetes Optimization: Establish best-practice configurations for containerized workloads, ensuring applications running on Kubernetes are highly resilient, cost-effective, and performant.

  • Incident Management & Culture: Participate in a balanced, shared on-call rotation (averaging one week per month).

  • Automation & Engineering: Build custom tooling and CI/CD pipelines to automate routine tasks, system health checks, and rapid disaster recovery workflows.

  • SLO/SLA Definition: Partner with product and engineering teams to define, monitor, and enforce Service Level Objectives (SLOs) and Error Budgets.

What We Look For

  • Experience: 5+ years of hands-on experience in a dedicated SRE, DevOps, or Infrastructure Engineering role supporting complex, distributed production systems.

  • Education: A Bachelor’s degree in Computer Science, Computer Engineering, or a related technical discipline (or equivalent practical experience).

  • Observability Expertise: Deep, subject-matter knowledge of modern monitoring stacks, specifically Grafana, Prometheus, Loki, and Tempo (OTel).

  • Orchestration & Containers: Strong, production-grade expertise in containerization (Docker) and orchestration (Kubernetes).

  • Hybrid Infrastructure: Experience navigating hybrid models—managing both cloud services (AWS preferred) and physical on-premise hardware resources.

  • Scripting/Coding: Proficiency in writing clean, maintainable code in at least one scripting or programming language (e.g., Python, Bash, or Go) to build reliable automation.

  • Methodologies: Solid grounding in CI/CD concepts, infrastructure-as-code (IaC), and agile development processes.

  • Soft Skills: Excellent verbal and written communication skills, with a proven ability to convey complex infrastructure and reliability concepts to both technical and non-technical stakeholders.

What We Offer

  • Stable Employment: Full-time employment contract (Umowa o Pracę - UoP).

  • Tax Optimization: Eligibility for creative tax-deductible costs (KUP - Koszty Uzyskania Przychodu).

  • Financial Reward: Highly competitive base salary accompanied by a generous annual performance bonus.

  • Comprehensive Health: Premium private medical care package that fully includes dental coverage (stomatologia).

  • Wellness & Lifestyle: MultiSport card to keep you active and healthy.

  • Daily Perks: Pre-funded lunch card for your daily meals.

Tech Stack at a Glance

  • Cloud & Virtualization: AWS, Kubernetes, Docker, On-Premises Hypervisors

  • Observability: Prometheus, Grafana, Loki, Tempo, OpenTelemetry (OTel)

  • Languages: Python, Go, Bash

  • CI/CD & Automation: Git-based pipelines, Configuration Management, IaC

Tech stack

    English

    C1

    AWS

    regular

    Prometheus

    regular

    Grafana

    regular

    Docker

    regular

    Kubernetes

    regular

    Python

    regular

Office location

Site Reliability Engineer (SRE)

10 938 - 15 039 USDGross per month - Permanent
Summary of the offer

Site Reliability Engineer (SRE)

Sienna 75, Warszawa
Yard Corporate
10 938 - 15 039 USDGross per month - Permanent
By applying, I consent to the processing of my personal data for the purpose of conducting the recruitment process. Informujemy, że administratorem danych jest Yard Corporate z siedzibą w Warszawie, ul. Sienna 75 (dalej jako "administrator"). Masz pr... MoreThis site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.