Working Hours: 7:00 PM – 5:00 AM CET
Willing to work in on-call rotation to participate in troubleshooting and communication efforts outside of normal business hours
Location: 100% remote work (Candidates must be based in Poland)
Employment Type: Full-time, B2B contract via Experis
Job Summary:
We are seeking a highly skilled Site Reliability Engineer to work closely with engineering teams to ensure our services and systems are highly available, performant, and aligned with the expectations of our business partners and end users. In this role, you will leverage your expertise in software development, complexity analysis, and scalable system design to deliver automation solutions that enhance availability and resiliency. You will also collaborate with the Incident Management Engineering team to address active incidents and drive long-term improvements.
Responsibilities:
Monitor system performance, identify areas for improvement, and implement solutions to enhance reliability and availability
Guide architecture and development teams on building highly available, reliable, and performant applications at a global scale
Partner with architecture teams to ensure operability, measurability, and manageability are embedded in business features and enablers
Collaborate with product owners and managers to implement and monitor key metrics to meet SLOs and SLAs
Work with development teams to troubleshoot and resolve issues
Lead Root Cause Analysis of production issues and failures in software, pipelines, or DevOps processes
Design and implement automated solutions to optimize uptime with minimal human intervention
Develop tools and processes to monitor AWS resources and cloud applications
Use Kubernetes and Docker to deploy platform services
Create and promote standards and best practices across development teams and external vendors
Requirements:
Bachelor’s degree or higher in Computer Science or a related technical field
5+ years of experience in deploying, administering, and troubleshooting large-scale distributed systems
5+ years of experience in automation programming using Python, Go, Java, Ruby, Rust, or JavaScript (Python and Go preferred)
5+ years of experience working with Linux terminal tools and shell scripting in a Linux environment
Strong understanding of public cloud service concepts
Strong knowledge of Unix/Linux internals and administration (Debian preferred)
Solid understanding of networking, storage systems, and database systems
Proven experience in debugging, code optimization, and task automation
Strong problem-solving and communication skills
Hands-on SRE experience including monitoring, alert creation, and alert tuning
Willingness to work West Coast hours (10 AM – 8 PM PST)
Willingness to participate in on-call rotation outside of normal business hours
Preferred Qualifications:
Experience with technologies such as MySQL, Redis, Nginx, Kubernetes, Docker, OpenStack, Hadoop, Spark, Flink, Kafka
Strong communication and presentation skills
Proactive and results-driven mindset
Excellent command of English (written and spoken)
Strong organizational skills in planning and prioritizing workload and initiatives
Offer:
MultiSport Plus
Group Insurance
Medicover Premium healthcare
Access to e-learning platform
Net per hour - B2B
Check similar offers