Data Engineer with DevOps experience (100% remote)
Engagement Terms:
Area: IT Consulting
Location: 100% remote
Start: ASAP (we accept candidates with a maximum 1-month notice period)
Rate (determined individually): EUR 42 - 52 net + VAT / hour
Engagement: B2B , full-time, 6M+
Recruitment Process (100% remote)
Project description
Our client is a leading international technology provider specializing in Application-to-Person (A2P) messaging and voice monetization solutions for mobile network operators (MNOs) worldwide. We are collaborating with them to develop a scalable Customer Data Platform (CDP) that transforms A2P records into actionable marketing insights. This involves building an ETL pipeline capable of processing hundreds of millions of records daily, ensuring data is ingested, enriched, transformed, and made accessible for non-technical end users.
The project addresses unique challenges, including deploying the CDP on each MNO's proprietary servers behind firewalls for data security. To enable scalability across hundreds of MNOs, the solution requires automated deployment (IaC). Additionally, the platform must support multi-tenancy to accommodate MNOs that own multiple subsidiaries.
Tech stack
Ingestion
Apache Kafka + Spark
Data lakehouse
Apache Iceberg + Nessie + MinIO
Extraction and enrichment
Pyspark + Python (Pytorch, Langgraph/-chain, open source models from Hugging Face)
Transformation
DBT Core + Dremio
Business facing applications
Metabase + FastAPI + Dremio
Role overview
The ideal candidate will focus on the data engineering part of the CDP with an emphasis on DevOps ensuring robust CI/CD practices for efficient scaling and maintenance
Key responsibilities
Design and implement data ingestion pipelines using Apache Kafka and Spark to handle large-scale A2P record processing.
Build and maintain the data lakehouse architecture with Apache Iceberg, Nessie, and MinIO, ensuring data integrity, versioning, and scalability.
Develop automated deployment strategies for the CDP, enabling seamless rollout to multiple MNO environments behind firewalls.
Integrate multi-tenancy features to support hierarchical MNO structures.
Collaborate with cross-functional teams to optimize the ETL pipeline, incorporating extraction, enrichment, transformation, and business-facing applications.
Establish and manage CI/CD pipelines with a strong emphasis on branching strategies in Nessie for version control and collaboration.
Troubleshoot and resolve issues related to data processing, security, and performance in a distributed, sensitive data environment
Skills and experience
Proven experience with data ingestion technologies, including Apache Kafka and Spark.
Strong expertise in data lakehouse solutions, particularly Apache Iceberg, Nessie, and MinIO.
Extensive hands-on experience in CI/CD pipelines, with a focus on branching and version management in Nessie.
Familiarity with deploying solutions in secure, firewall-protected environments and automating infrastructure provisioning.
Solid understanding of ETL processes for handling high-volume data (hundreds of millions of records daily).
We will extend the collaboration to other projects if we see a good fit.

Usernest ApS
Usernest ApS is a company that focuses on providing innovative solutions through user experience and digital transformation services. They specialize in enhancing digital products to improve customer engagement and satis...
Data Engineer with DevOps experience (100% remote)
Data Engineer with DevOps experience (100% remote)