Senior DevOps Engineer (AI & Platform Operations)
As a recruitment company, DCG understands that every business is powered by experienced professionals. Our management style and partnership approach enable us to meet your needs and provide continuous support. Due to our ongoing growth and the large number of recruitment projects we undertake for our partners, we are currently looking for:
Senior DevOps Engineer (AI & Platform Operations)
Responsibilities:
Incident & Problem Management: Own the RCA process for production incidents — diagnose, resolve, and put preventive measures in place so issues don't recur
Production Monitoring & Support: Continuously monitor service health, detect anomalies early, and act before they become incidents
Deployment Execution: Trigger and oversee release deployments through existing CI/CD pipelines; troubleshoot failed deployments and coordinate rollbacks when needed
Environment Oversight: Keep Pre-Production and Production environments stable and aligned — not building them from scratch, but ensuring they behave as expected day to day
Runbook & Knowledge Management: Document operational procedures, known issues, and resolution steps to build a reliable knowledge base for the team
Cross-team Collaboration: Work shoulder-to-shoulder with development and platform teams to triage issues, clarify operational requirements, and close the feedback loop between prod and dev
Identify recurring pain points and propose automation or tooling to reduce toil
Improve observability coverage — dashboards, alerts, log queries — to catch issues faster
Contribute to service continuity initiatives and disaster recovery drills
Requirements:
5+ years in IT operations, application support (2nd/3rd line), or a similar production-facing role
Proven track record of owning incidents end-to-end — from alert to RCA to prevention
2+ years working within an ITIL framework (incident, problem, change management)
Experience working in Agile delivery environments alongside development teams
Excellent English communication skills — able to explain technical issues clearly to both engineers and non-technical stakeholders
Proficiency with log analysis and alerting tools: Splunk, Apica, Sysdig
Observability tooling: Prometheus, Grafana — reading dashboards, tuning alerts
Comfortable operating services running on Kubernetes (checking pod health, reading logs, triggering restarts — not cluster administration)
Familiarity with Jenkins pipelines to execute and troubleshoot deployments
Relational databases (Oracle, DB2) — querying, interpreting execution plans, identifying data-related incidents
Working knowledge of Spring/Hibernate application behavior, Kafka message flows, XML/JSON payloads — enough to trace an issue through the stack
Nice to have:
Java/J2EE development background (helps enormously when reading stack traces and working with dev teams)
IBM Datastage operational experience
Scripting (Bash, Python) for automation of repetitive operational tasks
Ansible for applying configuration changes in controlled operational scenarios
Offer:
Private medical care
Co-financing for the sports card
Constant support of dedicated consultant
Employee referral program

DCG
DCG to przestrzeń, w której spotykają się potrzeby biznesu i ambicje ludzi. Znamy wartość dobrze dopasowanej współpracy, dlatego pomagamy kandydatom znaleźć środowisko, w którym będą mogli rozwinąć skrzydła, a firmom - z...Senior DevOps Engineer (AI & Platform Operations)
Senior DevOps Engineer (AI & Platform Operations)