Being responsible for developing data pipelines in the Big Data technology stack using Python (PySpark, Hadoop/HDFS, Kafka)
Implementing the connection to new source systems (e.g. Kafka, relational databases, REST API based services or file based data sources)
Ensuring that your implemented artefacts are covered with automated tests and are integrated in our CI/CD pipeline
Supporting your development lead in the definition of the development roadmap and the breakdown of it to user stories
Supporting the implementation of a generic ETL framework using Python including test automatisation
Profile:
Experience in Python and respective packages/frameworks in the area of Data Engineering (PySpark, PyTest)
Experience in the Hadoop ecosystem as a developer (Spark, HDFS, Hive/Impala, Kafka)
Experience in common interfaces to data sources and data formats (Kafka, REST API, relational databases, file shares, JSON, Protocol Buffers, Parquet, JSON)
Familiar with using CI/CD toolchains (GIT, Jenkins)
Familiar with the usage of relational databases and usage of SQL
Feeling responsible for deliverables during the phases of build & run in a DevOps team
Being proactive, supportive and open for new approaches
Bringing in your own ideas for technical and organizational enhancements
Benefits:
Contract of employment and a competitive salary (together with annual bonus)
Flexible working hours with home office after the pandemic as well
Complex environment of working, professional support and possibility to share knowledge and best practices
On-going development opportunities in a multinational environment
Broad access to professional trainings, conferences and webinars
Private medical care and life insurance
Sport pass, co-financing lunches, language courses
Number of benefits for families (for instance summer camps for kids)