AI Infrastructure Nutanix Site Reliability Engineer

Offer expired

DevOps

AI Infrastructure Nutanix Site Reliability Engineer

DevOps

Centrum, Riyadh

emagine Polska

Full-time

Any

Mid

Office

Job description

Job Title: AI Infrastructure Nutanix Site Reliability Engineer
Location: Saudi Arabia
Nationality: Saudi Nationals only
Experience: 5+ years

Job Overview:
We are seeking an experienced AI Infrastructure Site Reliability Engineer to support and optimize large-scale, distributed systems for a leading global technology client. The role focuses on ensuring high availability, scalability, and performance of AI-driven infrastructure in a Nutanix-based environment. End-to-end infrastructure management including hardware provisioning, firmware, OS, networking, storage, GPU tuning, and monitoring.

Main Responsibilities:

Manage and maintain AI infrastructure on Nutanix platforms.
Ensure system reliability, uptime, and performance through monitoring and automation.
Troubleshoot infrastructure, network, and application issues.
Implement CI/CD pipelines and infrastructure-as-code practices.
Collaborate with engineering teams to improve system resilience and scalability.
Optimize cloud and on-prem environments for AI/ML workloads.

Key Requirements:

5+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure roles.
Strong experience with Nutanix (AHV, AOS, Prism).
Strong understanding of hardware, OS, networking, and storage systems
Experience with GPU environments and performance tuning
Knowledge of cloud platforms (AWS, Azure, or GCP).
Experience with containerization (Docker, Kubernetes).
Proficiency in scripting (Python, Bash, or similar).
Familiarity with monitoring tools (Prometheus, Grafana, etc.).
Understanding of AI/ML infrastructure is a plus.

Other Details:
This position is based in Saudi Arabia and is open to Saudi nationals.

The role demands significant expertise in AI infrastructure management and a commitment to enhancing performance and reliability standards.

Tech stack

Arabic

C1

Microsoft Azure

advanced

DevOps

advanced

Machine Learning (ML)

advanced

Cloud

advanced

CI/CD

advanced

Grafana

advanced

Docker

advanced

Artificial Intelligence (AI)

advanced

automation

advanced

Amazon Web Services (AWS)

advanced

Office location

Check similar offers

Link Group

Remote

Site Reliability Engineer

New

5 755 - 6 577USD/month

Datadog

CI/CD

AI

Grafana

Azure

Kubernetes

Azure DevOps

MidMidB2BB2B

New

ADVERTISEMENT: Recommended by Just Join IT

Check similar offers

Link Group

Remote

Site Reliability Engineer

New

5 755 - 6 577USD/month

Datadog

CI/CD

AI

Grafana

Azure

Kubernetes

Azure DevOps

MidMidB2BB2B

New

Grid Dynamics Poland

Hybrid

Site Reliability Engineer

New

Undisclosed Salary

GCP

Terraform

Java

Python

MidMidPermanentPermanent

New

GlobalTech Poland Sp z O. O.

Warszawa

Hybrid

Site Reliability Engineer

New

Undisclosed Salary

GCP

JavaScript

Python

MidMidPermanentPermanent

New

PKO Bank Polski

Warszawa

Hybrid

Site Reliability Engineer

New

Undisclosed Salary

PostgreSQL

GCP

Java

Oracle

MidMidPermanentPermanent

New

EPAM Systems

Gdansk

Remote

Site Reliability Engineer

New

Undisclosed Salary

AWS

CI/CD

monitoring

Docker

Kubernetes

Python

Infrastructure as code

Site Reliability Engineering

SeniorSeniorAnyAny

New

ADVERTISEMENT: Recommended by Just Join IT