Lead Linux System Administrator

Admin

Lead Linux System Administrator

Admin
Ogrodowa 8, Łódź

ALTER GPU CENTER

Full-time
B2B
Senior
Remote

Job description

About the role

We are looking for a Lead Linux System Administrator to take technical ownership of the Linux environment supporting large-scale GPU infrastructure used for AI training and inference workloads.

This role combines hands-on system administration with team leadership. You will be responsible for the stability, performance, security, and day-to-day management of Linux-based GPU servers, while also supporting and mentoring a team of administrators working in a complex production environment.

Responsibilities

  • Lead, mentor, and support a team of Linux System Administrators responsible for GPU infrastructure operations

  • Manage the full Linux server lifecycle, including provisioning, patching, configuration management, hardening, and performance tuning

  • Maintain and optimize the NVIDIA GPU software stack, including drivers, CUDA, cuDNN, NCCL, and GPU management tools such as DCGM and nvidia-smi

  • Support and manage MIG and GPU time-slicing configurations where needed

  • Develop and maintain automation for bare-metal provisioning, OS image management, and server configuration using tools such as Ansible, Terraform, and scripting

  • Tune Linux systems for demanding workloads, including kernel parameters, local storage, parallel file systems, networking, and scheduler settings

  • Troubleshoot complex issues across hardware, drivers, the operating system, and cluster-level services

  • Work closely with DevOps/SRE, Site Operations, and AI/ML teams to ensure smooth integration between OS-level infrastructure and higher-level orchestration platforms

  • Support security hardening, vulnerability management, patch compliance, and operational standards across the server fleet

  • Participate in on-call support and contribute to continuous improvements in reliability, performance, and operational efficiency

Requirements

  • 7+ years of hands-on experience in Linux system administration in production environments

  • At least 3 years of experience in a technical lead, lead administrator, or people leadership role

  • Strong expertise in administering Linux systems at scale

  • Hands-on experience with NVIDIA GPUs in Linux environments, including drivers, CUDA ecosystem components, and GPU management tools

  • Strong experience with Ansible or other configuration management tools

  • Good scripting skills in Python and/or Bash

  • Experience with Infrastructure as Code and infrastructure automation

  • Good understanding of high-performance computing, storage systems, and high-speed networking technologies such as InfiniBand or RoCE

  • Experience supporting AI/ML or HPC workloads

  • Ability to troubleshoot complex production issues and work effectively in a high-availability environment

  • English proficiency at least at a communicative level is required, as you will be working in an international team

Nice to have

  • Experience with cluster management and orchestration tools such as Slurm, Kubernetes, or Run:ai

  • Familiarity with bare-metal provisioning tools and large server fleet management

  • Experience in AI infrastructure companies, hyperscalers, or HPC/research environments

  • Knowledge of Linux performance tuning for GPU-accelerated workloads

  • Higher education in Computer Science, Engineering, or a related field

What we offer

  • Benefits package

  • Opportunity to lead Linux infrastructure supporting advanced AI workloads at scale

  • Work with modern GPU hardware and software stacks in a technically demanding environment

  • Collaboration with experienced engineers across infrastructure, platform, and AI teams

  • A dynamic workplace with room for ownership, technical influence, and professional growth

Tech stack

    English

    B2

    Linux

    master

    Ansible

    advanced

    Leadership

    regular

    Python

    regular

    Bash

    regular

    IaC

    regular

Office location

Lead Linux System Administrator

Summary of the offer

Lead Linux System Administrator

Ogrodowa 8, Łódź
ALTER GPU CENTER
By applying, I consent to the processing of my personal data for the purpose of conducting the recruitment process. Please be informed that the data controller is ALTER GPU CENTER (hereinafter "controller"). You have the right to request access to yo... MoreThis site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.