Platform/Site Reliability Engineer (SRE)

DevOps

Platform/Site Reliability Engineer (SRE)

DevOps
-, Poznań +4 Locations

DCV Technologies

Full-time
B2B
Senior
Remote

Job description

Platform/Site Reliability Engineer (SRE)

We are looking for a DevOps Engineer on behalf of our client.

  • Remote from Poland

  • B2B

📌 The Platform Reliability Engineer is responsible for ensuring the reliability, performance, and availability of our critical platforms: Kong (API Management), Solace (Messaging), Mulesoft (iPaaS), and Informatica (ETL).

This role applies Site Reliability Engineering (SRE) principles — including automation, monitoring, and continuous improvement — to proactively identify and resolve potential issues, optimize platform performance, and collaborate with cross-functional teams to deliver exceptional service reliability.

This role requires a deep understanding of distributed systems, cloud technologies, and a passion for building resilient and scalable platforms.

The consultant will work closely with various platform teams in the Integration space and report directly to the Enterprise Integration Manager.


Platform Reliability & Performance (SRE Focus)

  • Ensure the reliability and availability of the Kong, Solace, Mulesoft, and Informatica platforms, applying SRE principles of automation, monitoring, and continuous improvement.

  • Proactively identify and resolve potential issues before they impact production environments, using data-driven insights and predictive analysis.

  • Develop and implement comprehensive monitoring and alerting systems to ensure platform health and performance.

  • Collaborate with the Support team and conduct thorough post-incident reviews with the goal of continuous improvement of platform reliability.

  • Conduct root cause analysis (RCA) for incidents and implement preventative measures, focusing on automation and systemic solutions.

  • Collaborate with development, operations, and security teams to ensure smooth platform operations, promoting a culture of shared responsibility for reliability.

  • Take ownership of platform SLAs and SLOs, ensuring they are met or exceeded, and proactively identify opportunities for improvement.

  • Evaluate and implement new tools and technologies to improve platform reliability and efficiency, staying up to date with the latest SRE trends and technologies.


Chaos Engineering & Resilience

  • Design, implement, and execute chaos engineering experiments to proactively identify weaknesses and vulnerabilities in integration platforms.

  • Develop and maintain a chaos engineering framework to systematically test platform resilience under various failure scenarios.

  • Analyze chaos experiment results and collaborate with engineering teams to implement improvements to enhance platform resilience.

  • Participate in designing and implementing fault-tolerant and self-healing systems.


Disaster Recovery & Business Continuity

  • Collaborate with DevOps engineers to develop, maintain, and test disaster recovery plans for the integration platforms.

  • Participate in disaster recovery exercises to validate plan effectiveness and identify areas for improvement.

  • Ensure disaster recovery plans align with business continuity requirements.

  • Implement and maintain backup and recovery procedures for critical platform components.


Upstream/Downstream Dependency Management

  • Analyze integration platform dependencies on other systems (e.g., API Gateway, backend services) and assess their reliability impact on overall service.

  • Implement monitoring and alerting for issues in upstream and downstream systems that could affect integration platforms.

  • Collaborate with other teams to improve the reliability and performance of dependent systems.

  • Design and implement strategies for handling failures in dependent systems, such as circuit breakers, retries, and fallbacks.


Collaboration & Communication

  • Work closely with the Support team to address platform-related issues and improve support processes, providing them with tools and knowledge to resolve issues efficiently.

  • Collaborate with Platform Engineers to optimize platform architecture and infrastructure, ensuring alignment with SRE best practices.

  • Partner with the Product Owner to define and communicate platform reliability metrics and performance to stakeholders through clear dashboards and reports.


Performance Optimization

  • Monitor platform performance and identify areas for optimization using performance profiling and load testing techniques.

  • Conduct performance testing and tuning to ensure optimal resource utilization and eliminate bottlenecks.

  • Collaborate with development teams to optimize application performance and provide guidance on best practices.

  • Implement caching strategies and other techniques to improve responsiveness and reduce latency.


Documentation and Knowledge Sharing

  • Create and maintain comprehensive documentation for daily activities, platform architecture, configuration, and operational procedures.

  • Ensure documentation is up to date and accessible.

  • Share knowledge and best practices with the team, fostering a culture of learning and collaboration.


Qualifications

  • Bachelor’s degree in Computer Science, Engineering, or a related field.

  • 5+ years of experience in a similar role focused on platform reliability and operations, ideally within an SRE environment.

  • Strong understanding of Kong API Gateway, Solace PubSub+, Mulesoft Anypoint Platform, and Informatica PowerCenter.

  • Experience with cloud platforms such as AWS, Azure, or GCP.

  • Proficiency in scripting languages such as Python, Bash, or Go.

  • Experience with infrastructure-as-code tools such as Terraform or Ansible.

  • Experience with monitoring and alerting tools such as Datadog.

  • Strong understanding of networking concepts and protocols.

  • Excellent problem-solving and troubleshooting skills.

  • Excellent communication and collaboration skills, with the ability to communicate technical concepts clearly.

  • Strong understanding of SRE principles and practices.

  • Experience with containerization (Docker, Kubernetes).

  • Experience with CI/CD pipelines and automation tools.

  • Relevant certifications (e.g., AWS Certified DevOps Engineer, Azure DevOps Engineer Expert, Google Cloud Professional Cloud Architect).

  • Experience with Agile development methodologies.

📩 If you’re interested and meet the qualifications, please send your CV to Alina Pchelnikova at alina.pchelnikova@dcvtechnologies.co.uk

Tech stack

    AWS

    regular

    Azure

    regular

    GCP

    regular

    API

    regular

    Terraform

    regular

    Agile

    regular

    Python

    regular

Office location

Platform/Site Reliability Engineer (SRE)

Summary of the offer

Platform/Site Reliability Engineer (SRE)

-, Poznań
DCV Technologies
By applying, I consent to the processing of my personal data for the purpose of conducting the recruitment process. Informujemy, że administratorem danych jest z siedzibą w , ul.(dalej jako "administrator"). Masz prawo do żądania dostępu do swoich da... MoreThis site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Check similar offers
Awareson Sp. z o.o.

Awareson Sp. z o.o.

Warszawa

Remote

Remote

52 - 63USD/h
AKS
Prometheus
Ansible/Puppet
Grafana
Docker
Azure
Terraform
Python
SRE
SeniorSeniorB2BB2B
New
ADVERTISEMENT: Recommended by Just Join IT
Applied -
Applied -
Check similar offers
Awareson Sp. z o.o.

Awareson Sp. z o.o.

Warszawa

Remote

Remote

52 - 63USD/h
AKS
Prometheus
Ansible/Puppet
Grafana
Docker
Azure
Terraform
Python
SRE
SeniorSeniorB2BB2B
New
Pretius

Pretius

Warszawa

Remote

Remote

33 - 41USD/h
CDN
Linux
Terraform
Ansible
OTT
Grafana
CI/CD
Datadog
Unix
AWS
SeniorSeniorB2BB2B
N-iX

N-iX

Remote

Remote

6 000 - 7 000USD/month
C#
CI/CD
Terraform
Java
SeniorSeniorB2BB2B
Antal Sp. z o.o.

Antal Sp. z o.o.

Warszawa

Remote

Remote

38 - 49USD/month
DevOps
SeniorSeniorB2BB2B
Sigma Software

Sigma Software

Remote

Remote

Undisclosed Salary
AWS
CI/CD
Terraform
Kubernetes
Python
SeniorSeniorB2B, PermanentB2B, Permanent
New
ADVERTISEMENT: Recommended by Just Join IT