AI Infrastructure Nutanix Site Reliability Engineer

DevOps

AI Infrastructure Nutanix Site Reliability Engineer

DevOps
Centrum, Riyadh

emagine Polska

Full-time
Any
Mid
Office

Job description

Job Title: AI Infrastructure Nutanix Site Reliability Engineer
Location: Saudi Arabia
Nationality: Saudi Nationals only
Experience: 5+ years

Job Overview:
We are seeking an experienced AI Infrastructure Site Reliability Engineer to support and optimize large-scale, distributed systems for a leading global technology client. The role focuses on ensuring high availability, scalability, and performance of AI-driven infrastructure in a Nutanix-based environment. End-to-end infrastructure management including hardware provisioning, firmware, OS, networking, storage, GPU tuning, and monitoring.

Main Responsibilities:

  • Manage and maintain AI infrastructure on Nutanix platforms.

  • Ensure system reliability, uptime, and performance through monitoring and automation.

  • Troubleshoot infrastructure, network, and application issues.

  • Implement CI/CD pipelines and infrastructure-as-code practices.

  • Collaborate with engineering teams to improve system resilience and scalability.

  • Optimize cloud and on-prem environments for AI/ML workloads.

Key Requirements:

  • 5+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure roles.

  • Strong experience with Nutanix (AHV, AOS, Prism).

  • Strong understanding of hardware, OS, networking, and storage systems

  • Experience with GPU environments and performance tuning

  • Knowledge of cloud platforms (AWS, Azure, or GCP).

  • Experience with containerization (Docker, Kubernetes).

  • Proficiency in scripting (Python, Bash, or similar).

  • Familiarity with monitoring tools (Prometheus, Grafana, etc.).

  • Understanding of AI/ML infrastructure is a plus.

Other Details:
This position is based in Saudi Arabia and is open to Saudi nationals.

The role demands significant expertise in AI infrastructure management and a commitment to enhancing performance and reliability standards.

Tech stack

    Arabic

    C1

    automation

    advanced

    Amazon Web Services (AWS)

    advanced

    Machine Learning (ML)

    advanced

    Artificial Intelligence (AI)

    advanced

    Docker

    advanced

    Cloud

    advanced

    Grafana

    advanced

    Microsoft Azure

    advanced

    DevOps

    advanced

    CI/CD

    advanced

Office location

AI Infrastructure Nutanix Site Reliability Engineer

Summary of the offer

AI Infrastructure Nutanix Site Reliability Engineer

Centrum, Riyadh
emagine Polska
By applying, I consent to the processing of my personal data for the purpose of conducting the recruitment process. Informujemy, że administratorem danych jest emagine z siedzibą w Warszawie, ul.Domaniewskiej 39A (dalej jako "administrator"). Masz pr... MoreThis site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.