Site Reliability Engineer
The Collaboration
As a Site Reliability Engineer, you will ensure our Marketplace platform remains stable, secure, and highly available. In this role, you will act as a proactive partner to engineering teams, focusing on incident prevention and infrastructure excellence. After achieving a record-low 6 minutes of downtime in 2025, your primary objective will be maintaining this standard while helping with a major migration to a new AWS region. This cooperation blends high-level engineering and automation with hands-on infrastructure optimization (reducing tech debt, strengthening observability, and refining deployment pipelines).
We operate in a "you build it, you run it" culture. Thanks to our US-based counterparts and a follow-the-sun model, service availability is managed globally, typically requiring your primary availability during Warsaw business hours with no scheduled duties after 5:00 PM CET.
Location & Collaboration
We are looking for someone based in the Warsaw Metropolitan Area. While we don't enforce a fixed desk schedule, the team typically synchronizes in-office on Tuesdays and Wednesdays to facilitate face-to-face project alignment.
Scope of Services
Balance your efforts between proactive automation/tooling (50%) and platform maintenance/optimization (50%).
Act as a designated "Point of Contact" for infrastructure support on a rotational basis (approx. 1 week per month) to ensure smooth service delivery.
Troubleshoot complex distributed system issues, investigate logs, and debug pods using modern observability tools.
Utilize modern AI tools (ClaudeCode, Cursor) to automate repetitive tasks and streamline workflows.
Help with the migration of our infrastructure stack to a new AWS region to optimize costs and availability.
Technical Profile
Practical experience with Terraform, CloudFormation, Pulumi, or similar tools on complex, production-level systems (beyond hobbyist-level work).
Hands-on experience managing environments within AWS and Kubernetes.
A strong programming background (any language) and a programmatic approach to solving infrastructure challenges.
Deep understanding of web application basics (HTTP, HTTPS, SSL, GET, POST, PUT) to effectively troubleshoot connectivity and service layers.
A realistic, skeptical attitude toward tools; you choose the right tech for the problem, not just the latest trend.
High "team fit" with the ability to engage in constructive disagreement to reach the best technical outcomes.
Site Reliability Engineer
Site Reliability Engineer