AI Infrastructure Nutanix Site Reliability Engineer
Job Title: AI Infrastructure Nutanix Site Reliability Engineer
Location: Saudi Arabia
Nationality: Saudi Nationals only
Experience: 5+ years
Job Overview:
We are seeking an experienced AI Infrastructure Site Reliability Engineer to support and optimize large-scale, distributed systems for a leading global technology client. The role focuses on ensuring high availability, scalability, and performance of AI-driven infrastructure in a Nutanix-based environment. End-to-end infrastructure management including hardware provisioning, firmware, OS, networking, storage, GPU tuning, and monitoring.
Main Responsibilities:
Manage and maintain AI infrastructure on Nutanix platforms.
Ensure system reliability, uptime, and performance through monitoring and automation.
Troubleshoot infrastructure, network, and application issues.
Implement CI/CD pipelines and infrastructure-as-code practices.
Collaborate with engineering teams to improve system resilience and scalability.
Optimize cloud and on-prem environments for AI/ML workloads.
Key Requirements:
5+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure roles.
Strong experience with Nutanix (AHV, AOS, Prism).
Strong understanding of hardware, OS, networking, and storage systems
Experience with GPU environments and performance tuning
Knowledge of cloud platforms (AWS, Azure, or GCP).
Experience with containerization (Docker, Kubernetes).
Proficiency in scripting (Python, Bash, or similar).
Familiarity with monitoring tools (Prometheus, Grafana, etc.).
Understanding of AI/ML infrastructure is a plus.
Other Details:
This position is based in Saudi Arabia and is open to Saudi nationals.
The role demands significant expertise in AI infrastructure management and a commitment to enhancing performance and reliability standards.
AI Infrastructure Nutanix Site Reliability Engineer
AI Infrastructure Nutanix Site Reliability Engineer