Senior Site Reliability Engineer

PubNative GmbH

Berlin

Type of work

Undetermined

Experience

Senior

Employment Type

Permanent

Operating mode

Office

Tech stack

Terraform

advanced

Docker / Kubernetes

advanced

Golang

regular

Job description

PubNative is a mobile monetization platform that enables app publishers to execute and enhance their revenue strategy through flexible ad units. With its proprietary cross-format optimization technology and mobile header bidding solution, PubNative enables mobile publishers to maximize their programmatic advertising revenue. The company is headquartered in Berlin with a satellite office located in Beijing.

Our system consists of a myriad of high load Golang-based APIs, iOS SDKs, Ruby/Rails 5 dashboard, Scala and Spark data- and ML pipelines, Druid OLAP system, running on a Mesos and Kubernetes cluster.

We're always on call to keep our networks up and running, ensuring our users have the best and fastest experience possible. We follow the “Infrastructure as Code” model and immutable deployment strategies.

We are looking for a Senior Site Reliability Engineer (m/f/div) to build and operate infrastructure platforms, and provide technical consultancy to engineering teams on how to build reliable, scalable and efficient services.

Your Responsibilities:

You enable us to build and expand a hybrid, multi-cloud-provider environment
You design, develop and operate monitoring, tracking platforms
You drive scalability and operability of supported systems/infrastructure
You work with other teams to provide consultations in systems architecture support for new and existing production systems
You write code so that you can automate tasks, support SLA for Production Systems, you support other engineering teams on reliability, scalability and efficiency topics
You manage OS image/templates via Packer, provision infrastructure via Terraform
You support CI/CD and make new pipelines
You engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation, and refinement
You support services before they go live through activities such as system design consulting
You maintain services once they are live by measuring and monitoring availability, latency, and overall system health
You participate in on-call rotation and be on-call for the services you build and support

Our Requirements:

5+ years of experience in a Site Reliability/DevOps role
Experience with public cloud providers (AWS, Google Cloud, Digital Ocean, etc.) and Infrastructure as Code (Terraform)
Strong programming skills and familiarity with modern programming languages: Go, Python, Shell, etc.
Excellent knowledge of Linux and ability to tune it for maximum performance
Knowledge of managing Docker containers and microservices via Kubernetes
Experience building and monitoring systems and metric collection pipelines
Track record of building automation and solving multi-datacenter/clouds infrastructure problems
Knowledge of algorithms, data structures, complexity analysis, software design, and reverse engineering
Interest in designing, analyzing and troubleshooting large-scale distributed systems
Experience working with source control - Git
Experience with continuous integration platforms such as TeamCity, Jenkins, CircleCI, etc.
Understanding of Agile, DevOps practices such as CI/CD, automated testing, etc.

What do we offer?

Reimbursement of the public transport monthly cards (company BVG abo)
Reimbursement of your private telephone bill (20€ after tax)
A day off on your birthday
Opportunity to attend industry events & conferences
Team events