Site Reliability Engineer
📍 Kraków (Hybrid – minimum 2 days/week in the office)
💼 Employment type: B2B
Are you looking for an opportunity to join a high-impact project in a global financial institution that invests heavily in cloud, AI, and DevOps? We're building a new Site Reliability Engineering (SRE) team in Kraków to support a mission-critical Counterparty Credit Risk (CCR) platform, and we're looking for experienced engineers to join the journey.
As part of this role, you'll contribute to the stability, scalability, and observability of a high-volume, distributed platform operating on both Google Cloud Platform and on-prem infrastructure.
- Ensure the reliability and high availability of production systems used in global credit risk management.
- Monitor, detect, and troubleshoot incidents in distributed systems running in cloud and hybrid environments.
- Implement observability tools (Grafana, Prometheus, Loki, etc.) and improve monitoring and alerting strategies.
- Lead root cause analysis (RCA) and post-incident reviews to improve resilience and operational efficiency.
- Collaborate with developers, DevOps engineers, and global support teams to implement SRE best practices.
- Contribute to CI/CD automation, deployment pipelines, and security/vulnerability remediation.
-
5+ years of experience in supporting or developing distributed systems (Java-based environments preferred).
- Hands-on experience with monitoring and logging tools: Grafana, Prometheus, Loki, Splunk, etc.
- Solid understanding of Unix/Linux systems, cloud infrastructure (GCP preferred), and databases (RDBMS).
- Experience with CI/CD tooling, such as Ansible, Jenkins, GitHub Actions, and vulnerability management.
- Familiarity with job scheduling tools (e.g., Control-M or equivalent).
- Strong communication skills and ability to drive technical discussions with multiple support teams.
- Experience working in Agile/Scrum teams.
- The chance to build and shape a new SRE team supporting a critical platform for global risk management.
- Work in a modern technology stack: Java, GCP, Apache Beam, Spring Boot, DevOps tooling.
-
Hybrid working model with at least 2 days/week in our Kraków office.
- Flexible form of cooperation (B2B or Employment Contract).
- Stable, long-term project with excellent opportunities for growth and learning.
📩 Interested? Apply now and take the next step in your career with a team that’s redefining reliability at a global scale.
To learn more about Antal, please visit www.antal.pl