Site Reliability Engineer

TransferRoom

Type of work

Full-time

Experience

Senior

Employment Type

Any

Operating mode

Remote

Tech stack

Terraform

master

Azure

master

Powershell

advanced

MS SQL Server

advanced

CI/CD

advanced

JIRA

advanced

Python

nice to have

Puppet

nice to have

Job description

TransferRoom is a profitable B2B SaaS Marketplace on a mission to change the football transfer market for the better. We do this by empowering football clubs, agents and players to be successful in the transfer market by giving them real-time market intelligence and direct access to a global network of decision-makers.

TransferRoom is market leading and has become a must have for key stakeholders in football’s transfer market. It’s used by 700 clubs, including Man City, Liverpool and PSG, 400 agencies and 8000 professional players. It has empowered clubs and agents to facilitate over 4,000 transfers since launch in 2017.

About the Role

TransferRoom is seeking a highly motivated and experienced Site Reliability Engineer to join our technology team. You will play a critical role in designing, implementing, and maintaining our infrastructure on Microsoft Azure, ensuring high availability, scalability, and performance for our applications. The ideal candidate will have a solid background in Cloud and Server Administration; DevSecOps; Infrastructure-as-Code; High-availability Architecture; Monitoring, Alerting, and Incidence Response; and Network Security.

Key Responsibilities:

Collaborate closely with the Software Development function to design, deploy, and manage secure, reliable, and scalable infrastructure on Azure.
Minimise or eliminate single-points-of-failure. Ensure the geographic redundancy of key assets.
Implement best practices for security, performance optimisation, scalability, and resilience testing (e.g. backup recovery testing).
Identify and manage operational risks and ensure service reliability.
Manage and minimise Cloud expenditure without compromising on security, stability, and performance.
Establish, own, and maintain SLAs.
Automate infrastructure provisioning and configuration using HashiCorp Terraform and IaC principles
Monitor and optimise the performance and health of our systems using Azure Monitor and other relevant tools.
Implement and maintain disaster recovery and high availability solutions.
Identify and troubleshoot system issues, diagnose root causes, and implement effective solutions.
Stay up-to-date with the latest advancements in cloud technologies and best practices.
Write and maintain system documentation.

Experience/Qualifications:

3+ years of experience as a Site Reliability Engineer or similar role.
Expertise in network security.
Strong understanding of cloud computing concepts and experience working with Microsoft Azure.
Expertise in Infrastructure-as-Code with Terraform.
Proficient in scripting languages like PowerShell, Python or Bash
Strong spoken and written English
Excellent problem-solving, analytical, and communication skills.
Ability to work independently and as part of a cross-functional team.
A passion for automation and continuous improvement.
Startup/Scale-Up Experience: Proven track record of thriving in startup or scale-up environments, with a demonstrated ability to adapt and lead in dynamic and rapidly evolving situations.
Knowledge of YAML, HCL, and JSON.
Knowledge of Jira, git and CI/CD automation tools.
Experience with Microsoft SQL Server administration.
Redis and Cosmos DB administration a plus.