Linux System Administrator

Admin

Linux System Administrator

Admin
Ogrodowa 8, Łódź

ALTER GPU CENTER

Full-time
B2B
Senior
Remote

Job description

About the role

We are looking for a Linux System Administrator to support the Linux environment behind large-scale GPU infrastructure used for AI training and inference workloads.

This is a hands-on role focused on the deployment, maintenance, performance tuning, and reliability of Linux-based GPU servers. You will work closely with infrastructure and platform teams to keep the environment stable, secure, and ready for demanding production workloads.

Responsibilities

  • Install, configure, patch, and maintain Linux operating systems across GPU-based server environments

  • Manage and support the NVIDIA GPU software stack, including drivers, CUDA, cuDNN, NCCL, DCGM, and MIG/time-slicing configurations

  • Perform system performance tuning, kernel optimization, storage configuration, and networking setup for AI/HPC workloads

  • Develop and maintain automation scripts and operational tooling using Python, Bash, or similar technologies

  • Monitor system health, investigate alerts, and troubleshoot issues across hardware, drivers, operating systems, and cluster services

  • Support bare-metal provisioning and integration with orchestration platforms such as Slurm or Kubernetes

  • Work closely with Site Operations, DevOps/SRE, and AI/ML teams to support stable GPU cluster operations and infrastructure growth

  • Participate in on-call support, incident response, root cause analysis, and post-incident improvement activities

  • Support security hardening, patch compliance, vulnerability management, and operational standards across the server fleet

Requirements

  • 4–8 years of hands-on experience in Linux system administration in production environments

  • Good knowledge of enterprise Linux environments, such as Ubuntu, Debian, Red Hat Enterprise Linux, or Rocky Linux

  • Experience with Linux administration at scale

  • Practical experience with configuration management, scripting, and infrastructure automation

  • Good scripting skills in Python and/or Bash

  • Good understanding of performance tuning, storage systems, and high-speed networking technologies such as RDMA, InfiniBand, or RoCE

  • Experience working with NVIDIA GPUs in Linux environments, including drivers, CUDA components, and GPU monitoring tools, will be a strong advantage

  • Ability to troubleshoot complex technical issues in production environments

  • English proficiency at least at a communicative level is required, as you will be working in an international team

Nice to have

  • Experience in AI/ML, HPC, or large-scale data center environments

  • Experience with bare-metal provisioning and fleet management

  • Familiarity with Slurm, Kubernetes, or similar orchestration tools

  • Knowledge of observability tools such as Prometheus and Grafana

  • Familiarity with DCIM platforms

  • Higher education in Computer Science, Engineering, or a related field

What we offer

  • Benefits package

  • Opportunity to work on Linux infrastructure supporting advanced AI workloads

  • Exposure to modern GPU hardware and high-performance computing technologies

  • Collaboration with experienced engineers across infrastructure, platform, and AI teams

  • A dynamic environment with room for ownership, learning, and professional growth


Tech stack

    English

    B2

    Linux

    advanced

    Ubuntu

    regular

    Debian

    regular

    Red Hat

    regular

    Python

    regular

    Bash

    regular

Office location

Linux System Administrator

Summary of the offer

Linux System Administrator

Ogrodowa 8, Łódź
ALTER GPU CENTER
By applying, I consent to the processing of my personal data for the purpose of conducting the recruitment process. Please be informed that the data controller is ALTER GPU CENTER (hereinafter "controller"). You have the right to request access to yo... MoreThis site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.