Site Reliability Engineer

379.64 - 488.10 USDNet per day - B2B
DevOps

Site Reliability Engineer

DevOps
Cracow, Cracow

Caspian One

Full-time
B2B
Senior
Hybrid
379.64 - 488.10 USD
Net per day - B2B

Job description

We’re looking for a seasoned Site Reliability Engineer to support a high‑performance, mission‑critical risk and analytics platform used across global trading and finance environments. You’ll play a key role in ensuring the stability, scalability, and observability of complex distributed systems running across hybrid cloud infrastructure.

In this role, you’ll take ownership of production reliability driving incident response, conducting root‑cause analysis, improving monitoring capabilities, and delivering automation that reduces operational toil. You’ll work closely with development teams, platform engineers, and service management leads to strengthen resilience, refine processes, and enhance the engineering culture around availability and performance.

This is a hands on technical position suited to someone who thrives in high‑throughput environments, communicates clearly, and enjoys solving deep engineering problems in real time.


Core Responsibilities

  • Maintain and improve the reliability, uptime, and performance of distributed applications.

  • Lead incident response, triage complex issues, coordinate recoveries, and deliver structured post‑incident reviews.

  • Enhance observability—designing and evolving monitoring, alerting, logging, and tracing frameworks.

  • Drive continuous improvement across automation, deployment processes, and service stability.

  • Collaborate with cross‑functional teams to influence architecture, design, and operational standards.

  • Support CI/CD pipelines, environment configuration, and vulnerability remediation.

  • Contribute to a knowledge‑driven culture through documentation, tooling, and best‑practice adoption.


Required Skills & Experience

  • Strong Java background with proven experience supporting or developing distributed systems.

  • Observability tooling expertise (Grafana, Prometheus, Loki, OpenTelemetry or similar).

  • Hands‑on with hybrid cloud environments, ideally with GCP or another major cloud provider.

  • CI/CD and automation experience (e.g., Jenkins, Ansible).

  • Solid understanding of Linux, RDBMS fundamentals, and job schedulers (e.g., Control‑M or equivalents).

  • Strong analytical mindset with a methodical approach to troubleshooting.

  • Excellent communication skills and comfort working in Agile teams.

Tech stack

    Jenkins

    advanced

    cicd

    advanced

    Telemetry Stack

    advanced

    Ansible

    advanced

    GCP

    advanced

    Linux

    regular

    Java

    regular

    Spring Boot

    regular

Office location

Site Reliability Engineer

379.64 - 488.10 USDNet per day - B2B
Summary of the offer

Site Reliability Engineer

Cracow, Cracow
Caspian One
379.64 - 488.10 USDNet per day - B2B
By applying, I consent to the processing of my personal data for the purpose of conducting the recruitment process. Please be informed that the data controller is Caspian One (hereinafter "controller"). You have the right to request access to your pe... MoreThis site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.