Principal Site Reliability Engineer
Are you ready to lead infrastructure strategy for a cutting‑edge AI‑driven SaaS platform? We are looking for a Principal Site Reliability Engineer with a proven track record in scaling, optimizing, and securing cloud‑based systems. This senior role offers the opportunity to shape the reliability and performance of a platform used by finance teams worldwide.
In this role, you will be part of a dynamic engineering environment where your expertise will directly influence product stability and growth. You will work with advanced cloud technologies, automation tools, and AI-driven solutions, contributing to projects that push the boundaries of innovation.
If you are ready to take on strategic responsibility and make a tangible impact, apply now and join us in building the future of reliable, scalable systems.
Customer
Sigma Software is partnering with a fast‑growing AI‑driven SaaS platform serving finance and accounting teams in high‑growth businesses. The platform automates critical workflows — from billing and collections to revenue recognition and reporting, ensuring compliance and accelerating cash flow. Leveraging advanced AI, it reduces manual work, increases operational efficiency, and supports scalability for customers worldwide.
Project
The project focuses on building and scaling an AI-powered SaaS solution for finance automation. It integrates advanced machine learning models with robust cloud infrastructure to deliver secure, compliant, and high‑performance services. The engineering culture emphasizes automation, resilience, and operational excellence.
Requirements
At least 8 years of experience in Site Reliability Engineering or DevOps roles, including 2+ years in a Principal or Lead position
Proven experience in infrastructure modernization and scaling initiatives for high‑growth environments
Strong proficiency in Python
Deep expertise in cloud platforms and container orchestration tools such as AWS ECS and EKS
Solid experience in CI/CD pipeline design and optimization using tools like GitHub Actions and Buildkite
Proficiency in infrastructure‑as‑code tools such as Terraform
Strong knowledge of monitoring, observability, and performance optimization practices
Upper-Intermediate level of spoken and written English
Would be a plus:
Experience with monorepos (Turborepo, pnpm)
Familiarity with modern TypeScript tools (swc, biome, oxc)
Knowledge of NestJS, NextJS, and testing frameworks (Jest, Vitest)
Personal Profile
Excellent leadership, communication, and decision‑making abilities
Ability to work independently and make pragmatic build‑vs‑buy decisions in fast‑paced environments
Responsibilities
Define and lead infrastructure and reliability strategy across the platform
Design scalable, resilient systems in collaboration with engineering teams
Optimize build, testing, and deployment processes for speed and stability
Establish and uphold best practices for CI/CD, monitoring, and observability
Lead incident response and drive continuous improvement post‑incident
Automate workflows to reduce operational toil and risk
Mentor engineers and foster a culture of operational excellence
Make strategic build‑vs‑buy decisions balancing speed, quality, and sustainability

Sigma Software
Sigma Software is a global software development company founded in 2002. It enables enterprises, startups, and product houses to meet their technology needs through end-to-end delivery, providing top-quality software dev...Principal Site Reliability Engineer
Principal Site Reliability Engineer