DevOps/SRE
Plac Solny, Wrocław +4 Locations
Shelf
The Platform Engineering team works across the stack to give product teams paved, secure, cost-efficient paths to build, ship, and run software with minimal cognitive load. We own the "how," so product teams can focus on the "what." You will join the team responsible for running the core infrastructure that supports Shelf products. This role is primarily based in our European offices in Wrocław, Poland and Lviv, Ukraine. We will prioritize candidates who are already in Wrocław or are open to relocating there, as we believe in the value of in-person collaboration to foster strong relationships and seamless communication within our team.
In certain specific situations, we will also consider remote candidates based in one of the countries listed in this job posting. In any case, we ask all new hires to visit our office for the first week of their onboarding (accommodation and travel covered) and then at least 2 days per month or a week per 2 months.
You will develop reusable components, improve system performance, and create scalable abstractions that accelerate product development across the organization.
You will maintain high standards for reliability and security in your work and in the systems used by other teams.
This is a high-ownership, hands-on engineering role. You will manage everything from Terraform/OpenTofu modules and CI/CD pipelines to SSO permissions and observability tools, with a mandate to build infrastructure that works and keeps working.
You will work with AWS, Datadog, OpenTofu, Snowflake, GitHub, Azure, various LLMs, and many other tools and services.
In this role, you will
Write and maintain infrastructure as code in OpenTofu, making modules more reusable and robust so that more engineers can ship infrastructure safely on their own.
Write clear runbooks and playbooks that explain how things work and what to do when they break. You present your work in a clean, structured way, prefer writing a good doc once to enable self-serve, and treat every question as a signal to either document the answer or automate it so it does not need to be asked again.
Care deeply about the health of our infrastructure by keeping databases, LLMs, and third-party self-hosted services on current, supported versions, standardizing them across environments, and actively hunting down and removing outdated components instead of tolerating an aging tech stack.
Participate in on call rotations and incident response, and write clear postmortems with concrete action items. You enjoy turning every incident into an opportunity to improve, define and refine SLOs and error budgets, and then follow through on the work that prevents repeats, tightens detection, speeds up response, and makes recovery cleaner.
Treat CI/CD pipelines as a critical product. Own and improve hundreds of pipelines by making them faster, more reliable, easier to roll back, and more standardized so they reduce manual toil and mental overhead for developers.
Become a Datadog and observability expert, tuning logging, metrics, tracing, dashboards, and alerts to squeeze out as much useful signal as possible. Build simple defaults, automation, and clear docs so developers can self serve, contribute to observability, and rely on a solid platform rather than on ad hoc help from you.
Make thoughtful build vs buy decisions and work directly with vendors and cloud support (AWS, Azure, GCP, and others) to solve infrastructure problems, plan upgrades, and find cost savings, preferring to ask good questions and pull in expertise rather than silently struggle on your own.
Implement, and enforce SOC 2 aligned policies for infrastructure and deployments, including disaster recovery and business continuity, change management, and security policies, and ensure they are practical, documented, and followed in day to day work.
You might thrive in this role if you
Take pride in building and operating scalable, reliable, secure systems and are not comfortable bypassing protocols or cutting corners.
Take full ownership of your work, handle ambiguity and rapid change, and proactively remove obstacles to deliver results.
Are comfortable diving into any part of the stack, from infrastructure and backend services to product frontends, when that is what it takes.
Use Python to automate repetitive work and improve your own and others’ workflows.
Read AWS re:Invent announcements and us-east-1 post-mortems for fun.
Sample projects
Provision the entire product infrastructure and applications in a new cloud or region.
Design and implement a live database migration from us-east-1 to us-east-2.
Maintain a 100% score on the AWS CIS Benchmark in our environments.
Centralize audit trail logs from AWS, GCP, and Azure into a single place.
Write a clear runbook describing how you conducted a disaster recovery test of a system component.
Change the SSO provider and reconfigure services to use the new provider.
Optimize infrastructure costs by improving configuration, identifying abandoned resources, and applying reserved or committed compute purchases where appropriate.
What Shelf Offers
B2B contract
Company Stock Options
Hardware: MacBook Pro
Modern technical stack. Develop open-source software
Premier AI development environment: GitHub Copilot, Claude Code, OpenAI, TypingMind, v0, MCP Servers, plus credits to experiment with emerging AI tools
Why Shelf
Leadership with deep knowledge management, AI, and enterprise SaaS expertise
Customers love us for innovative capabilities, reliability, and measurable business impact
$60M+ raised from top-tier investors including Tiger Global, Insight Partners, and Base10
High-velocity growth, tripling year over year for three consecutive years
100+ employees across the U.S. and Europe with ambitious hiring plans
About Shelf
There is no AI Strategy without a Data Strategy. Getting GenAI to work is mission-critical for most companies, but 90% of AI projects haven't deployed. Why? Poor data quality—it’s the #1 obstacle companies face getting GenAI into production.
Shelf unlocks AI readiness. We provide the core infrastructure that enables GenAI to be deployed at scale. We help companies deliver more accurate GenAI answers by eliminating bad data in documents and files before they go into an LLM and create bad answers.
We’re partnered with Microsoft, Salesforce, Snowflake, Databricks, OpenAI and other leaders bringing GenAI to the enterprise. Our mission is to empower humanity with better answers everywhere.
DevOps/SRE
DevOps/SRE
Plac Solny, Wrocław
Shelf