Site Reliability Engineer
al. 29 Listopada 20, Kraków
OpenX
We are a pioneer in cloud computing in Poland. With the recent migration to Google Cloud Platform (GCP) we have the largest infrastructure cloud footprint in Poland. It’s on such a large scale, Google is working to solve our problems.
We are seeking a Site Reliability Engineer to join our Big Data Team, who will be primarily responsible for the performance, uptime, and growth of various OpenX systems and services on GCP. Much of your software development focuses on optimizing cloud-native systems, orchestrating cloud infrastructure and eliminating manual work through automation.
Excellent communication skills are crucial in this position so you could successfully interact with globally distributed OpenX teams operating in a 24x7 manner.
Key responsibilities:
Design, write and deliver software to implement and support large web-scale, highly-performant, highly-available infrastructure on GCP/AWS (e.g. Terraform)
Monitor infrastructure, respond to incidents, correct and improve systems to prevent incidents, and plan capacity
Support system deployments and product releases
Tune large-scale clusters for optimal performance and efficiency
Working closely with engineering, project management, and operational peers to develop innovative technical tools and solutions
Participation in on-call rotation (in the future)
What you need to have to be successful:
At least 3 years of AWS/GCP experience
Bachelor’s degree in Computer Science, related technical field involving systems engineering, or equivalent practical experience
Shell scripting
Experience in one of the following: Java, Python, Go, or other
Good Polish and English skills
Desirable Qualifications:
Expertise in designing, analyzing and troubleshooting large-scale distributed systems
Good understanding of public cloud services and tasks, such as: VPC; load balancing; relational and non-relational datastores (e.g., Google Cloud SQL, Memorystore, AWS RDS); storage (e.g., GCS, AWS S3); monitoring (e.g., GCP Stackdriver, AWS CloudWatch, Prometheus); serverless computing (e.g., GCF, AWS Lambda); and auto-scaling
Kubernetes/Docker/Containers experience
Ability to debug and optimize code and automate routine tasks
Site Reliability Engineer
Site Reliability Engineer
al. 29 Listopada 20, Kraków
OpenX