Linux System Administrator

Admin

Ogrodowa 8, Łódź

ALTER GPU CENTER

Full-time

B2B

Senior

Remote

Job description

About the role

We are looking for a Linux System Administrator to support the Linux environment behind large-scale GPU infrastructure used for AI training and inference workloads.

This is a hands-on role focused on the deployment, maintenance, performance tuning, and reliability of Linux-based GPU servers. You will work closely with infrastructure and platform teams to keep the environment stable, secure, and ready for demanding production workloads.

Responsibilities

Install, configure, patch, and maintain Linux operating systems across GPU-based server environments
Manage and support the NVIDIA GPU software stack, including drivers, CUDA, cuDNN, NCCL, DCGM, and MIG/time-slicing configurations
Perform system performance tuning, kernel optimization, storage configuration, and networking setup for AI/HPC workloads
Develop and maintain automation scripts and operational tooling using Python, Bash, or similar technologies
Monitor system health, investigate alerts, and troubleshoot issues across hardware, drivers, operating systems, and cluster services
Support bare-metal provisioning and integration with orchestration platforms such as Slurm or Kubernetes
Work closely with Site Operations, DevOps/SRE, and AI/ML teams to support stable GPU cluster operations and infrastructure growth
Participate in on-call support, incident response, root cause analysis, and post-incident improvement activities
Support security hardening, patch compliance, vulnerability management, and operational standards across the server fleet

Requirements

4–8 years of hands-on experience in Linux system administration in production environments
Good knowledge of enterprise Linux environments, such as Ubuntu, Debian, Red Hat Enterprise Linux, or Rocky Linux
Experience with Linux administration at scale
Practical experience with configuration management, scripting, and infrastructure automation
Good scripting skills in Python and/or Bash
Good understanding of performance tuning, storage systems, and high-speed networking technologies such as RDMA, InfiniBand, or RoCE
Experience working with NVIDIA GPUs in Linux environments, including drivers, CUDA components, and GPU monitoring tools, will be a strong advantage
Ability to troubleshoot complex technical issues in production environments
English proficiency at least at a communicative level is required, as you will be working in an international team

Nice to have

Experience in AI/ML, HPC, or large-scale data center environments
Experience with bare-metal provisioning and fleet management
Familiarity with Slurm, Kubernetes, or similar orchestration tools
Knowledge of observability tools such as Prometheus and Grafana
Familiarity with DCIM platforms
Higher education in Computer Science, Engineering, or a related field

What we offer

Benefits package
Opportunity to work on Linux infrastructure supporting advanced AI workloads
Exposure to modern GPU hardware and high-performance computing technologies
Collaboration with experienced engineers across infrastructure, platform, and AI teams
A dynamic environment with room for ownership, learning, and professional growth

Tech stack

English

Linux

advanced

Ubuntu

regular

Debian

regular

Red Hat

regular

Python

regular

Bash

regular

Office location

Linux System Administrator

Summary of the offer

Linux System Administrator

Ogrodowa 8, Łódź

ALTER GPU CENTER

By applying, I consent to the processing of my personal data for the purpose of conducting the recruitment process. Please be informed that the data controller is ALTER GPU CENTER (hereinafter "controller"). You have the right to request access to yo... MoreThis site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Check similar offers