Principal Platform Engineer

DevOps

-, Warszawa +4 Locations

Point Wild

B2B Contract

B2B

Senior

Remote

Job description

Principal Platform Engineer

Poland, Remote or Ukraine, Remote or Romania, Remote

Point Wild helps customers monitor, manage, and protect against the risks associated with their identities and personal information in a digital world. Backed by WndrCo, Warburg Pincus and General Catalyst, Point Wild is dedicated to creating the world’s most comprehensive portfolio of industry-leading cybersecurity solutions. Our vision is to become THE go-to resource for every cyber protection need individuals may face - today and in the future.

Join us for the ride!

We’re looking for a Principal Platform Engineer to architect and lead the infrastructure strategy for our next-generation Production ML platform on Google Cloud. In this role, you will be the backbone of our high-performance machine learning workloads, ensuring our systems are elastic, secure, and resilient. You won’t just maintain the status quo; you’ll build the "paved road" for our engineers, automating everything from model deployment to complex networking perimeters. We are a high-trust, outcome-focused team that moves quickly to solve some of the most challenging problems in the ML space.

Core Responsibilities:

Infrastructure Management: Design, deploy, and maintain elastic scaling cloud infrastructure (GCP) and containerization tools like Kubernetes for high-performance ML workloads.
CI/CD Pipeline Development and maintenance: Build automated pipelines for training, testing, and deploying machine learning models using tools like Jenkins, GitHub Actions, or Airflow.
Model Monitoring & Maintenance: Implement observability tools to track model drift, accuracy, latency, and performance degradation in production.
Collaboration: Bridge the gap between data engineers, ML engineers, Backend and Frontend engineers to ensure smooth production operation.
ML Observability: Implement comprehensive monitoring for system health (latency/uptime) alongside ML-specific metrics, such as feature drift, prediction accuracy, and data distribution shifts, to ensure long-term model reliability. Non ML workload and production metrics monitoring.
Deploy tools that empower individual teams to monitor their workloads.
Participate in on-call rotation, help manage posture to ensure compliance with standards such as SOC.

What you bring to the table:

Senior Expertise: 8 - 10+ years in DevOps/Platform Engineering, with at least 2 years of experience specifically operating and maintaining production ML workloads.
GCP & K8s Mastery: Deep, hands-on experience with GCP (VPC-SC, IAM, Organization Policies) and GKE (Cluster topology, Helm, Kustomize, and in-cluster operators like ArgoCD).
Service Mesh Excellence: High proficiency with Istio (VirtualServices, mTLS, sidecar injection) and API Gateways (specifically Kong).
Infrastructure as Code: Expert-level Terraform skills, specifically using an Atlantis/GitOps workflow across a massive, multi-hundred-file estate.
Secrets & Identity: Experience managing enterprise-grade identity and secrets (Auth0, Dex, ESO, or SOPS).
Data/ML Tooling: Experience operating Airflow in production and an ML-serving stack (e.g., Triton, vLLM, MLflow).
Database Management: Comfortable managing Cloud SQL (PostgreSQL), BigQuery, and in-cluster datastores like Elasticsearch or ClickHouse.
At least an upper-intermediate level of spoken and written English.

It would be great if you also had:

ML Observability: Past experience with continuous monitoring of model accuracy and detecting data/concept drift.
Automation Savvy: Experience with Ansible for cluster bootstrap and recovery.
Advanced Certifications: Kubernetes (CKA/CKS) or GCP Professional Cloud Architect/Security Engineer certifications.
Modern Stack Exposure: Familiarity with Loki, Grafana, or managing ClickHouse at scale.

Why Join Us?

Professional Growth: Enhance your skills by collaborating closely with talented engineers and architects.
Meaningful Impact: Help create software that protects millions of users from cybersecurity threats.
Supportive Environment: Join a team dedicated to innovation, mentorship, and continuous learning.

Tech stack

English

GKE

master

GCP

master

Kubernetes

advanced

Terraform

advanced

Machine Learning

advanced

Ansible

nice to have

Grafana

nice to have

Office location

Principal Platform Engineer

Summary of the offer

Principal Platform Engineer

-, Warszawa

Point Wild

By applying, I consent to the processing of my personal data for the purpose of conducting the recruitment process. Please be informed that the data controller is Point Wild (hereinafter "controller"). You have the right to request access to your per... MoreThis site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Check similar offers