#1 Job Board for tech industry in Europe

Site Reliability Engineer (SRE)

DevOps

Site Reliability Engineer (SRE)

Kevin Edward

Warszawa

3 602 - 4 803 USDNet/month - B2B

Type of work

Full-time

Experience

Mid

Employment Type

B2B

Operating mode

Hybrid

Tech stack

SRE

master

DevOps

master

application support

advanced

Production Support

advanced

Linux / Unix

advanced

ITIL

regular

Kubernetes

regular

Job description

Online interview

Friendly offer

The ideal candidate will have strong experience with Docker, Kubernetes, and Unix/Linux systems, along with a deep understanding of incident management, production support, and application monitoring. You will collaborate closely with development, operations, and security teams to resolve production issues quickly and efficiently while continuously improving the systems' reliability.

Key Responsibilities:

Production Support & Incident Management:
Provide production support for mission-critical financial applications, ensuring high availability and performance.
Lead and coordinate incident management efforts, ensuring incidents are quickly diagnosed, mitigated, and resolved, with a focus on reducing downtime and service interruptions.
Troubleshoot production issues across applications, infrastructure, and networking, working closely with development and operations teams to implement long-term fixes.
System Monitoring & Performance Tuning:
Monitor and optimize the performance, availability, and reliability of systems using modern monitoring tools (e.g., Prometheus, Grafana, Datadog, New Relic).
Implement and manage alerting systems to proactively detect and resolve potential issues before they impact users.
Optimize and tune the infrastructure and applications to improve performance and reduce system resource usage.
Infrastructure Automation & DevOps Practices:
Automate infrastructure deployment, scaling, and management processes using tools such as Docker, Kubernetes, and CI/CD pipelines to ensure continuous integration and delivery.
Write and maintain infrastructure-as-code (e.g., Terraform, Ansible, etc.) to enable efficient deployment and scaling of systems and applications.
Work with DevOps teams to implement best practices for containerization, orchestration, and automation.
Collaboration with Development Teams:
Work closely with development teams to ensure that production systems are scalable, reliable, and secure.
Participate in the design, implementation, and review of new features or systems with an emphasis on their operational readiness for production.
Provide feedback on system designs and improvements, helping to bridge the gap between development and operations.
Disaster Recovery & Business Continuity:
Collaborate with the team on disaster recovery planning and ensure systems have proper backup, failover, and recovery procedures in place.
Lead efforts in capacity planning and scaling systems to meet growing traffic and data requirements while ensuring minimal impact on performance.
Security & Compliance:
Ensure that all production systems are secure and comply with industry standards and regulations related to data security, privacy, and financial compliance.
Work with security teams to address vulnerabilities and implement security best practices in application and infrastructure management.
Continuous Improvement & Documentation:
Contribute to the continuous improvement of processes, systems, and tools for better performance and reliability.
Maintain detailed documentation of systems, incidents, operational procedures, and troubleshooting steps to improve knowledge sharing and support scalability.

Required Skills and Qualifications:

Proven experience in site reliability engineering or production support within a Fintech or similarly high-demand industry.
Strong experience with Docker and Kubernetes for container orchestration, scaling, and management.
Unix/Linux experience (system administration, shell scripting, troubleshooting, performance tuning) is mandatory.
Hands-on experience with incident management and production support, including using incident response tools (e.g., PagerDuty, Opsgenie) and root cause analysis.
Solid knowledge of cloud platforms (AWS, GCP, Azure) and experience managing cloud-native applications and infrastructure.
Experience with application monitoring tools (e.g., Prometheus, Grafana, Datadog, New Relic) to ensure system reliability.
Experience with CI/CD pipelines and infrastructure automation tools (e.g., Terraform, Ansible, Jenkins, GitLab CI).
Excellent problem-solving and troubleshooting skills, with the ability to diagnose and resolve complex production issues.
Experience with disaster recovery, backup strategies, and high availability architectures for critical systems.
Strong communication skills with the ability to work collaboratively across teams, including developers, operations, and business stakeholders.
Knowledge of financial services regulations, compliance, and security best practices is a plus.

Preferred Skills:

Experience with monitoring and alerting solutions in high-volume environments (e.g., Prometheus, ELK Stack).
Familiarity with microservices architectures and understanding how to manage and scale large distributed systems.
Exposure to automated testing and performance benchmarking tools for infrastructure and applications.
Experience with logging and log management tools (e.g., ELK, Splunk).
Familiarity with networking concepts and troubleshooting in distributed systems.

Apply for this job

I am happy for the Kevin Edward Consultancy Limited to save my contact details for future correspondence.

Check similar offers

Devops Engineer

New

Aspire Systems Poland

3.87K - 5.28K USD

Warszawa

, Fully remote

Fully remote

Azure Cloud

GitLab

Docker

ML Ops Engineer

New

Capco Poland

Undisclosed Salary

Warszawa

, Fully remote

Fully remote

TensorFlow

DevOps

PyTorch

AWS DevOps Engineer

New

Link Group

4.83K - 6.24K USD

Warszawa

, Fully remote

Fully remote

Cloud

Terraform

AWS

DevOps Engineer – SwaggerHub

New

SmartBear

3.36K - 4.08K USD

Warszawa

, Fully remote

Fully remote

Bash

Linux

GitLab

Senior/Mid Cloud DevOps Engineer - One Portal Team

New

T-Mobile Polska S.A.

Undisclosed Salary

Warszawa

, Fully remote

Fully remote

GitLab

Shell Scripting

Docker