AI SRE – Nutanix
Location: Saudi Arabia
Nationality: Saudi Nationals only
We are hiring an AI Site Reliability Engineer (SRE) to ensure reliability and performance of Nutanix environments supporting AI workloads.
Main Responsibilities:
- Support system design and drive capacity planning for AI workloads
- Handle bare-metal & hypervisor (AHV/ESXi) provisioning and optimization
- Own infrastructure lifecycle automation (deployments, patching, scaling)
- Ensure high availability through proactive monitoring, alerting, and incident response
- Troubleshoot performance issues across compute, storage, networking, and GPU layers
- Manage VM lifecycle and workload migrations
- Hardware provisioning, firmware, OS, networking, storage, GPU tuning, monitoring
Key Requirements:
- 5+ years in SRE / Production Support / Infrastructure Operations
- Strong experience with Nutanix (AOS/AHV) or similar HCI platforms
- Exposure to AI/ML workloads and GPU environments
- Hands-on automation (Python/PowerShell) and monitoring tools
Other Details:
This position is focused on ensuring the reliability and performance of Nutanix-based environments specifically tailored to handle AI workloads. Candidates should be local Saudi Nationals with the relevant experience and skills.