As a DevOps Engineer, you’ll join a high-impact project focused on deploying and maintaining a modern, containerized infrastructure to support machine learning workflows. The stack will run on Kubernetes clusters hosted on OpenStack. While the underlying OpenStack environment is already in place, you will be responsible for validating integrations, deploying Kubernetes, and ensuring infrastructure readiness for production-grade ML workloads.
Your work will directly support data teams and engineers by delivering scalable, automated, and secure environments tailored for machine learning and advanced analytics.
-
Validating and integrating an existing OpenStack-based infrastructure.
-
Deploying and configuring Kubernetes clusters in a private cloud.
-
Automating deployment and configuration of ML-supporting systems within Kubernetes.
-
Creating Infrastructure-as-Code using Terraform and/or Ansible.
-
Ensuring scalability, observability, and security across the entire platform.
-
Collaborating with developers, data scientists, and ML engineers.
-
Documenting architecture, deployment procedures, and operational best practices.
Must have:
-
Solid hands-on experience with OpenStack (compute, networking, storage, identity).
-
Strong knowledge of Kubernetes (solid experience in designing, deploying, and managing Kubernetes clusters in private cloud and/or on bare metal)
-
Experience with some IAM tools such as Keycloak or similar.
-
Familiarity with distributed storage systems (ideally Ceph) and storage integration in Kubernetes.
-
Experience in deploying containerized applications using Helm / Operators.
-
Understanding of machine learning infrastructure needs (e.g., experiment tracking, orchestration, model deployment).
-
Proficiency with Infrastructure-as-Code tools such as Terraform and/or Ansible.
-
Strong Linux administration skills and scripting (Bash, Python, Go).
-
Familiarity with Git-based CI/CD pipelines and DevOps best practices.
-
Ability to work in the team but also independently and take ownership of cloud environments.
Nice to have:
-
Experience with public cloud solutions (e.g., AWS, Azure)
-
Experience with CI/CD systems (e.g., GitLab CI, Jenkins).
-
Familiarity with monitoring/logging (e.g., Prometheus, Grafana, ELK).
-
Certified OpenStack Administrator (COA) or Kubernetes certification (CKA/CKAD).
-
Exposure to MLOps workflows.