Site Reliability Engineer

DevOps

Chmielna 69, Warszawa

Transition Technologies MS

Full-time

B2B, Permanent

Senior

Hybrid

Job description

We are looking for a Site Reliability Engineer to ensure the reliability, scalability, and operational excellence of a production-grade AI platform running across Azure and AWS environments. You will work closely with AI and fullstack engineers to automate deployments, improve observability, optimize infrastructure costs, and support highly available LLM-powered services at scale.

Your responsibilities:

Own the reliability, scalability, and performance of platform services running on Azure Container Apps and AWS ECS.
Build and maintain CI/CD pipelines using GitHub Actions for automated testing, deployment, and release management.
Manage infrastructure as code using Terraform, Bicep, or ARM templates across Azure and AWS environments.
Implement and maintain monitoring, alerting, logging, and observability solutions (New Relic, Langfuse, CloudWatch).
Configure and manage Azure Service Bus, Blob Storage, Key Vault, and containerized environments.
Ensure security best practices, including secret management, vulnerability scanning, and container image hardening.
Implement auto-scaling, load balancing, and cost optimization strategies for AI and LLM workloads.
Support incident response processes and create operational runbooks for production services.
Collaborate with AI engineers to optimize LLM API usage, reduce latency, and control token consumption costs.

We are looking for you, if you have:

3–5+ years of experience in SRE, DevOps, or platform engineering roles.
Strong hands-on experience with Microsoft Azure, including Container Apps, Service Bus, Key Vault, Blob Storage, and Azure OpenAI resources.
Practical knowledge of AWS services such as ECS, S3, Aurora, and CloudWatch.
Experience with Infrastructure as Code tools: Terraform, Bicep, or ARM templates.
Experience designing and maintaining CI/CD pipelines (GitHub Actions preferred).
Strong understanding of Docker and container orchestration; Kubernetes experience is a strong advantage.
Experience with monitoring and observability platforms such as New Relic or equivalent tools.
Familiarity with security best practices, including secrets management, vulnerability remediation, and image scanning.
Scripting and automation skills in Python and/or Bash.
Daily usage of AI-powered development tools such as Cursor, Claude Code, or GitHub Copilot.
Fluent English communication skills, both spoken and written.

We offer:

Participation in interesting and demanding projects
Flexible working hours
A great, non-corporate atmosphere
Stable employment conditions (contract of employment or B2B contract)
Opportunities for development and promotion
Attractive package of benefits
Work model: remote or hybrid (2 days per week from the office)

We reserve the right to contact the selected candidates.

Tech stack

English

AWS

advanced

Azure

advanced

Terraform

advanced

GitHub Actions

advanced

Docker

advanced

Kubernetes

regular

Python

regular

Bash

regular

Azure OpenAI Resource Management

regular

Office location

About the company

Transition Technologies MS

Transition Technologies MS is a company specializing in providing advanced IT solutions and software development services. It focuses on innovative technologies to support business digital transformation.

Company profile

Site Reliability Engineer

Summary of the offer

Site Reliability Engineer

Chmielna 69, Warszawa

Transition Technologies MS

By applying, I consent to the processing of my personal data for the purpose of conducting the recruitment process. Informujemy, że administratorem danych jest Transition Technologies MS S.A. z siedzibą w Warszawie, ul. Chmielna 69, (dalej jako "admi... MoreThis site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Check similar offers