All offersKrakówDevOpsSite Reliability Engineer
Site Reliability Engineer
DevOps
OANDA Poland

Site Reliability Engineer

OANDA Poland
Kraków
Type of work
Undetermined
Experience
Senior
Employment Type
Permanent
Operating mode
Office

Tech stack

    GCP
    advanced
    AWS
    advanced
    JavaScript
    advanced
    Terraform
    nice to have
    Ansible
    nice to have
    Kubernetes
    nice to have
    Prometheus
    nice to have
    C++
    nice to have
    Python
    nice to have

Job description

Online interview
OANDA is looking for a passionate Site Reliability Engineer looking to apply software development principles and practices to solve difficult operations problems. As an SRE, you will be embedded in one of our development teams, acting as the champion for reliability best-practices including observability, automation, high-availability, fault tolerance, and full-lifecycle ownership. The perfect candidate for this role has a strong data-driven approach improving the performance of our products in on-premise and cloud environments.

Experience &  Skills:

  • Experience as a software developer, or in an SRE-related field; a solid development background and understanding of software development practices is necessary to be successful in this role.
  • The best candidates will have experience working in cloud-native and on-premise environments, in bare metal, virtualized (VM), and containerized / orchestrated deployments.

Primary Duties:

  • Champion a culture of shared service ownership within your development team.
  • Tap into your passion for eliminating repetitive manual processes (toil) using automation and apply this through infrastructure-as-code and configuration management tools (Ansible, Terraform, Helm).
  • Enable your team to make data-driven decisions by pushing monitoring, instrumentation, and observability as core tenets of our development practice.
  • Demonstrate best-in-class deployment and delivery methodologies, leveraging Kubernetes, Anthos, Cloudflare, and CNCF tools to drive cloud adoption and standardization across our on-premise and cloud (AWS, GCP) environments.
  • Contribute to the development of our product(s) by writing code in Python, C++, JavaScript, Go, or other languages to solve difficult performance and reliability problems. Perform code review for your development peers to ensure reliability, observability, and security are key pillars of our work.
  • Collaborate with product managers and business stakeholders to set and maintain Service level Objectives (SLOs) and metrics that are representative of our customer experience. Tune our approach to alerting to manage alert fatigue.
  • Help scale our security function by advocating for security best-practices within your team, and working with OANDA’s security team to apply DevSecOps practices within your workflows.
  • Experiment with (and lead the implementation of) new technologies and methodologies gleaned from your involvement in the global SRE, DevOps, and Cloud communities. Attend and contribute to continuing education, conferences, and seminars to stay current with industry and community trends.
  • Articulate the SRE ethos to your peers and stakeholders and help educate your colleagues the application of SRE principles to achieve a healthy balance of new feature development and reliability initiatives.
  • Participate in a cross-functional on-call rotation to support your code into production. Ensure the health of the on-call rotation to avoid operational overload, lead the blameless post-mortem process, and feed remediation tasks back into our development pipeline.
  • Draft playbooks, and conduct tabletop and chaos engineering exercises to avoid operational underload and identify opportunities for improvement.
  • Ensure that our development teams and applications adhere to OANDA’s engineering standards. This includes supporting and demonstrating compliance with OANDA’s security and privacy standards..
  • Set a great example and encourage others to espouse the culture and values of the company to other internal teams and the general public.