Data Engineer

Dotcommunity

Kraków

Type of work

Undetermined

Experience

Mid

Employment Type

Permanent

Operating mode

Office

Tech stack

SQL

master

data processing

advanced

Data storage

advanced

English

advanced

Big Data

regular

Spark

nice to have

AWS

nice to have

Yarn

nice to have

Airflow

nice to have

Microservices

nice to have

Job description

We are looking for a candidate to join our client's team as a Data Engineer. They are a software house which operates in the advertising and media industry, located at Armii Krajowej street.

Advertising Solutions is a relatively new area in Media organisation which houses engineering teams for our back-office systems used by various sales organisations at the company. These systems include Rose which is used to book advertising campaigns and Vantage which provides campaign reporting.

ABOUT THE TEAM

They are now looking to establish a team to own and operate the Advertising API that underpins these products, along with others within Schibsted. You will be off to a running start and will have to learn the ropes of existing systems with the help of their established teams, plan and execute on the hand-over from the current team in London. Expect initial travels to London and/or hosting the London team members locally.

Once you learn the system you and your teammates will continuously work on its technical evolution, scaling and simplification. You will be expected to be an active participant when deciding how to implement new features together with the neighbouring teams that depend on you for their work.

SKILLS & REQUIREMENTS

They handle more than 250000 campaigns, 100000 advertisers, and more than 140 publishers across 20 different countries. About 1.5 TB of data is processed every day using more than 100 Spark jobs.

Their data pipeline is built on top of AWS EMR, Spark, Yarn, Airflow and microservices based on Twitter Finatra framework. Apache Avro and Parquet are used for data serialization and schema definition/evolution.

They don’t expect you to have experience with all the technologies that they use but it would be good if you know at least some of them or have worked with similar ones.

You should be deeply interested in data processing and data storage in general. You need to be well-versed in the area of databases – mainly SQL, optimization of them, schematic design and indexes.
You should have an understanding of such big data concepts as map reduce, CAP theorem and big table.
This role does not require cooperation with data science, at least at this point. And our scale is substantial but rather not huge.