DevOps Engineer
Position Overview
Important: Travel & On-Call Requirements
This role requires readiness for long-distance international travel to customer sites. The systems are deployed globally and, when issues cannot be resolved remotely, on-site interventions may be necessary, including deployments, upgrades, and complex troubleshooting activities.
Additionally, the position includes participation in a rotational on-call / standby schedule, ensuring operational continuity and the ability to respond to critical incidents outside of standard working hours.
We are looking for an experienced DevOps Engineer to join a team responsible for the maintenance and further development of a complex automation system deployed on-premise at customer sites. The system is based on Linux (Ubuntu) and a containerized Kubernetes architecture.
The platform consists of multiple cooperating application and infrastructure components, including:
backend services
GPU-based computing components (CUDA)
communication layer
storage
networking components
The environment is characterized by high operational complexity and strong dependencies between system layers (OS, Kubernetes, applications, networking, storage). Systems are deployed across multiple locations worldwide and often operate in environments with limited local IT support, which requires high reliability and well-defined operational procedures.
Responsibilities
Incident Handling and System Maintenance
Diagnosing and resolving issues related to:
Kubernetes clusters
containers (Docker)
Linux (Ubuntu) operating system
networking
storage (including NFS)
Analyzing logs and service health across application and infrastructure layers
Restoring full system functionality in production environments
Performing system deployments and upgrades at customer sites
Participating in on-site interventions when issues cannot be resolved remotely
Collaboration and Architecture Improvement
Close cooperation with development and architecture teams
Contributing to architecture simplification and standardization
Improving overall system stability and reliability
Supporting long-term efforts to reduce operational overhead and manual interventions
Technical Requirements
Strong experience with Linux (Ubuntu) system administration and troubleshooting
Hands-on experience with Kubernetes, including cluster troubleshooting and container analysis
Practical knowledge of Docker
Solid understanding of networking and diagnosing network-related issues
Experience with NFS / storage troubleshooting
Operational knowledge of GPU / CUDA environments (compatibility, stability)
Experience working with:
RabbitMQ
PostgreSQL
We offer
Flexible working hours
Remote work options
Medical care program
MultiSport
Integration events
A contract of employment or self-employment, depending on You

xBerry Sp. z o.o.
xBerry to zespół inżynierów i technologicznych twórców, którzy projektują innowacyjne rozwiązania dla biznesu. Tworzymy technologie z obszaru uczenia maszynowego, IoT, robotyki i systemów wbudowanych – i znacznie więcej....DevOps Engineer
DevOps Engineer