Production Systems Engineer – Mass Recovery

DevOps

Production Systems Engineer – Mass Recovery

DevOps
Centrum, Krakow

ITDS

Full-time
B2B
Mid
Hybrid
4 666 - 6 066 USDNet per month - B2B

Job description

Unleash resilience and shape the future of disaster recovery — drive enterprise-wide mass outage response and infrastructure robustness!

Krakow-based opportunity with hybrid work model.

As a Production Systems Engineer – Mass Recovery, you will be working for a leading financial institution committed to safeguarding the stability of the global financial system. You will help design and implement advanced IT resilience strategies, ensuring rapid, effective recovery from major incidents affecting critical services. This is your chance to be at the forefront of innovative disaster recovery solutions, making tangible impact in a dynamic banking environment.

Your main responsibilities:

  • Develop and maintain detailed service dependency models across applications, platforms, and infrastructure layers to support disaster recovery efforts.
  • Identify, document, and analyze shared failure domains such as virtualization, storage, and network components.
  • Define scenario-based blast radius models to anticipate and mitigate mass outage impacts.
  • Support rapid failure correlation by analyzing service failures and providing actionable insights for recovery teams.
  • Validate and challenge existing resilience data sources, ensuring alignment with real system behaviors.
  • Document gaps in resilience, including RTO mismatches and missing recovery pathways, to enhance recovery strategies.
  • Collaborate with cross-functional teams and tooling platforms to extract and synthesize relevant operational data.
  • Contribute to designing fault-tolerant architectures and recovery procedures for high-availability systems.

You're ideal for this role if you have:

  • Minimum of 4 years’ experience in production engineering, site reliability engineering, or infrastructure engineering within large-scale environments.
  • Strong knowledge of virtualization platforms (ESX), cloud providers, and storage/big data systems.
  • Solid understanding of networking fundamentals and infrastructure topology.
  • Hands-on experience working with CMDB platforms (like ServiceNow), observability tools (such as AppDynamics, Splunk).
  • Proven ability to analyze complex data sets, identify patterns, and derive practical insights.
  • Experience operating under high-pressure incident management scenarios.
  • Excellent communication skills in English, fluent command required.

It is a strong plus if you have:

  • Previous experience within banking or financial services, especially with HSBC or similar institutions.
  • Exposure to Disaster Recovery or Mass Recovery planning/execution.
  • Data manipulation and extraction skills.
  • Familiarity with Jira/Confluence and large distributed system environments.

Eligibility for the role:

  • Only candidates with an existing legal right to work in Europe will be considered for this role.

#MAKEYourCareerBETTER
Interested? Apply now and include your CV (preferably in English) along with a statement confirming your consent to the processing and storage of your personal data.

Tech stack

    English

    B2

    Infrastructure Engineering

    advanced

    Incident management

    advanced

    Disaster Recovery Planning

    advanced

    Site Reliability Engineering

    advanced

    Virtualization Platforms (ESX)

    regular

    Data analysis

    regular

    Cloud Computing

    regular

    Networking Fundamentals

    regular

    Observability Tools (AppDynamics, Splunk)

    regular

    CMDB Platforms (ServiceNow)

    regular

Office location

Check similar offers
EPAM Systems

EPAM Systems

Krakow

Remote

Remote

Undisclosed Salary
DevOps
GitLab CI/CD
Bash
Ansible
REST API
Terraform
Cisco
Platform Engineering
Python
MidMidAnyAny
New
ADVERTISEMENT: Recommended by Just Join IT
Check similar offers
EPAM Systems

EPAM Systems

Krakow

Remote

Remote

Undisclosed Salary
DevOps
GitLab CI/CD
Bash
Ansible
REST API
Terraform
Cisco
Platform Engineering
Python
MidMidAnyAny
New
GlobalTech Poland

GlobalTech Poland

Warszawa

Hybrid

Hybrid

Undisclosed Salary
Ansible
Docker
GCP
Terraform
MidMidPermanentPermanent
New
Antal Sp. z o.o.

Antal Sp. z o.o.

Kraków

Hybrid

Hybrid

3 523 - 4 878USD/month
Linux
MS SQL
Kubernetes
MidMidB2BB2B
New
Sorigo

Sorigo

Warszawa

Remote

Remote

Undisclosed Salary
CI/CD
Ansible
Docker
GitHub
GCP
Terraform
Kubernetes
GKE
MidMidB2BB2B
New
DCG

DCG

Gdańsk

Hybrid

Hybrid

27 - 30USD/month
Calypso
MidMidB2BB2B
New
ADVERTISEMENT: Recommended by Just Join IT