Site Reliability Engineer

DevOps

Cracow, Cracow

Caspian One

Full-time

B2B

Senior

Hybrid

389 - 500 USDNet per day - B2B

Job description

We’re looking for a seasoned Site Reliability Engineer to support a high‑performance, mission‑critical risk and analytics platform used across global trading and finance environments. You’ll play a key role in ensuring the stability, scalability, and observability of complex distributed systems running across hybrid cloud infrastructure.

In this role, you’ll take ownership of production reliability driving incident response, conducting root‑cause analysis, improving monitoring capabilities, and delivering automation that reduces operational toil. You’ll work closely with development teams, platform engineers, and service management leads to strengthen resilience, refine processes, and enhance the engineering culture around availability and performance.

This is a hands on technical position suited to someone who thrives in high‑throughput environments, communicates clearly, and enjoys solving deep engineering problems in real time.

Core Responsibilities

Maintain and improve the reliability, uptime, and performance of distributed applications.
Lead incident response, triage complex issues, coordinate recoveries, and deliver structured post‑incident reviews.
Enhance observability—designing and evolving monitoring, alerting, logging, and tracing frameworks.
Drive continuous improvement across automation, deployment processes, and service stability.
Collaborate with cross‑functional teams to influence architecture, design, and operational standards.
Support CI/CD pipelines, environment configuration, and vulnerability remediation.
Contribute to a knowledge‑driven culture through documentation, tooling, and best‑practice adoption.

Required Skills & Experience

Strong Java background with proven experience supporting or developing distributed systems.
Observability tooling expertise (Grafana, Prometheus, Loki, OpenTelemetry or similar).
Hands‑on with hybrid cloud environments, ideally with GCP or another major cloud provider.
CI/CD and automation experience (e.g., Jenkins, Ansible).
Solid understanding of Linux, RDBMS fundamentals, and job schedulers (e.g., Control‑M or equivalents).
Strong analytical mindset with a methodical approach to troubleshooting.
Excellent communication skills and comfort working in Agile teams.

Tech stack

Jenkins

advanced

cicd

advanced

Telemetry Stack

advanced

Ansible

advanced

GCP

advanced

Linux

regular

Java

regular

Spring Boot

regular

Office location

Site Reliability Engineer

389 - 500 USDNet per day - B2B

Summary of the offer

Site Reliability Engineer

Cracow, Cracow

Caspian One

389 - 500 USDNet per day - B2B

By applying, I consent to the processing of my personal data for the purpose of conducting the recruitment process. Please be informed that the data controller is Caspian One (hereinafter "controller"). You have the right to request access to your pe... MoreThis site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Check similar offers