AI Infrastructure Nutanix Site Reliability Engineer

DevOps

AI Infrastructure Nutanix Site Reliability Engineer

DevOps
Centrum, Riyadh

emagine Polska

Full-time
Any
Mid
Office

Job description

Job Title: AI Infrastructure Nutanix Site Reliability Engineer
Location: Saudi Arabia
Nationality: Saudi Nationals only
Experience: 5+ years

Job Overview:
We are seeking an experienced AI Infrastructure Site Reliability Engineer to support and optimize large-scale, distributed systems for a leading global technology client. The role focuses on ensuring high availability, scalability, and performance of AI-driven infrastructure in a Nutanix-based environment. End-to-end infrastructure management including hardware provisioning, firmware, OS, networking, storage, GPU tuning, and monitoring.

Main Responsibilities:

  • Manage and maintain AI infrastructure on Nutanix platforms.

  • Ensure system reliability, uptime, and performance through monitoring and automation.

  • Troubleshoot infrastructure, network, and application issues.

  • Implement CI/CD pipelines and infrastructure-as-code practices.

  • Collaborate with engineering teams to improve system resilience and scalability.

  • Optimize cloud and on-prem environments for AI/ML workloads.

Key Requirements:

  • 5+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure roles.

  • Strong experience with Nutanix (AHV, AOS, Prism).

  • Strong understanding of hardware, OS, networking, and storage systems

  • Experience with GPU environments and performance tuning

  • Knowledge of cloud platforms (AWS, Azure, or GCP).

  • Experience with containerization (Docker, Kubernetes).

  • Proficiency in scripting (Python, Bash, or similar).

  • Familiarity with monitoring tools (Prometheus, Grafana, etc.).

  • Understanding of AI/ML infrastructure is a plus.

Other Details:
This position is based in Saudi Arabia and is open to Saudi nationals.

The role demands significant expertise in AI infrastructure management and a commitment to enhancing performance and reliability standards.

Tech stack

    Arabic

    C1

    Microsoft Azure

    advanced

    DevOps

    advanced

    Machine Learning (ML)

    advanced

    Cloud

    advanced

    CI/CD

    advanced

    Grafana

    advanced

    Docker

    advanced

    Artificial Intelligence (AI)

    advanced

    automation

    advanced

    Amazon Web Services (AWS)

    advanced

Office location

Check similar offers
Link Group

Link Group

Remote

Remote

5 755 - 6 577USD/month
Datadog
CI/CD
AI
Grafana
Azure
Kubernetes
Azure DevOps
MidMidB2BB2B
New
ADVERTISEMENT: Recommended by Just Join IT
Check similar offers
Link Group

Link Group

Remote

Remote

5 755 - 6 577USD/month
Datadog
CI/CD
AI
Grafana
Azure
Kubernetes
Azure DevOps
MidMidB2BB2B
New
Grid Dynamics Poland

Grid Dynamics Poland

Hybrid

Hybrid

Undisclosed Salary
GCP
Terraform
Java
Python
MidMidPermanentPermanent
New
GlobalTech Poland Sp z O. O.

GlobalTech Poland Sp z O. O.

Warszawa

Hybrid

Hybrid

Undisclosed Salary
GCP
JavaScript
Python
MidMidPermanentPermanent
New
PKO Bank Polski

PKO Bank Polski

Warszawa

Hybrid

Hybrid

Undisclosed Salary
PostgreSQL
GCP
Java
Oracle
MidMidPermanentPermanent
New
EPAM Systems

EPAM Systems

Gdansk

Remote

Remote

Undisclosed Salary
AWS
CI/CD
monitoring
Docker
Kubernetes
Python
Infrastructure as code
Site Reliability Engineering
SeniorSeniorAnyAny
New
ADVERTISEMENT: Recommended by Just Join IT