All offersWarszawaDevOpsAWS Cloud Infra Engineer
AWS Cloud Infra Engineer
DevOps
Fresha / Shedul

AWS Cloud Infra Engineer

Fresha / Shedul
Warszawa
Type of work
Undetermined
Experience
Senior
Employment Type
B2B
Operating mode
Office

Tech stack

    Docker
    advanced
    AWS
    advanced
    Terraform
    advanced
    Ansible
    advanced
    Linux
    advanced
    Jenkins
    regular
    Kubernetes
    regular
    PostgreSQL
    regular
    New Relic
    regular

Job description

Do you want to work for Fresha.com / Shedul.com, a fast-growing platform that is revolutionizing the beauty and wellness industry? We are a global startup with offices in Warsaw, London, and Dubai. Our high-traffic booking platforms have quickly become a game-changing industry leader, with users in over 120 countries. Our customers book millions of appointments monthly, with thousands of active users at any time.

Today we’re looking for an AWS Cloud Infrastructure Engineer (AWS CIE) - a person who'll join our Site Reliability Engineering team to apply her/his Software Engineering expertise (we have 100% Infrastructure-as-Code approach) to solve IT Operations problems. This role is not about (at least primarily) building new user-facing features, but about taking care of non-functional aspects of the platform's architecture (& infrastructure code behind it) - e.g. performance, scalability, high availability, resilience, maintainability, security, etc.

In other words, such a person's daily activities will cover (a selection, not complete list):
  • development & continuous improvement of our Infrastructure-as-Code setup (Ansible, Terraform, Jenkins pipelines, Kubernetes, Helm), incl. automated tests for infra code
  • setting up & improving observability mechanisms for various platform elements (both technical & business aspects) - monitoring, alerting, error reporting, notifications, etc.
  • detecting (e.g. via chaos engineering) & driving the removal (with your own or other teams' hands) of architectural bottlenecks, malicious attack vectors, root causes of technical problems
  • measuring & optimising non-functional architecture qualities (expressed as SLIs/SLOs)
  • design, execute, validate & evolve mechanisms (tools & procedures) needed by such crucial processes like: disaster recovery, capacity & performance testing, out-scaling various components of the platform, end-to-end deployment pipeline, etc.
  • automation of all of the above & much more :)

All of that to make sure that our exponentially growing traffic won't affect our never-ending quest to bring unparalleled value to both beauty salons & their demanding customers. This is NOT a greenfield, but a well-shaped, global, serious scale platform with over 8M bookings per month & 20% quarterly growth.

Your profile (required):

  • Practical experience with AWS IaaS/PaaS services (mainly EC2, RDS) and Kubernetes (or equivalent container-orchestrating system, or even a service mesh).
  • Practical experience with infrastructure characteristics of web-scale on-line applications & services (how to measure, monitor, assess their health)
  • Engineering attitude: based on pragmatism, measurable facts & common-sense (e.g. understanding of "good enough")
  • Engineering maturity: not being stuck to own opinions, taking ownership over decisions, cold-blooded risk assessment
  • Being able to plan, design & execute complex, multi-step change action plans (e.g. platform component upgrades, migrations, complex deployments) while eliminating (or at least minimising) platform down-times
  • Genuine interest in non-functional, infrastructure parts of the platform & their architectural qualities (performance, maintainability, resilience, high availability, etc.)
  • DevOps attitude - collaboration, silo removal, shared responsibility, improving quality, fast feedback, automation
  • Experience with CI tools (preferably Jenkins)
  • At last but not least - strong communication skills, ability to self-organise and work well within team (& with other teams)

Nice to have:

  • Practical (hands-on) experience with provisioning tools (Ansible, Terraform or equivalents)
  • Practical (hands-on) experience with writing back-end applications
  • Experience with distributed architectures & various enterprise integration patterns (e.g. message brokers, RPC)
  • Experience with monitoring & alerting of infrastructure, containers, web apps and services
  • Experience with getting insights out of large volumes of data - with data visualization, effective search, basic analytical techniques
  • PostgreSQL scalability, fine-tuning, data replication and backup skills and experience

How we work:

  1. Our technology stack: https://stackshare.io/fresha/fresha
  2. Article on how we've started doing SRE: https://no-kill-switch.ghost.io/platform-keepers-c...
  3. Article on how we've moved from Heroku to AWS EKS: https://medium.com/fresha-engineering/gradually-sw...