Data Engineer – Data Lake (f/m/x)
You will join an international project within the healthcare and life sciences industry, focused on building and evolving a modern Data Lake platform supporting large-scale data processing and analytics. The solution enables data-driven decision-making in a highly regulated environment, with a strong emphasis on data quality, security, and compliance. The environment is cloud-based and leverages modern big data technologies and best engineering practices.
As a Data Engineer, you will be responsible for designing, developing, and maintaining data pipelines and Data Lake architecture. You will work closely with cross-functional teams, including data scientists and business stakeholders, to deliver reliable and efficient data solutions.
Your tasks
Designing and developing scalable data pipelines for batch and real-time data processing
Building and optimizing Data Lake architecture for analytical use cases
Integrating multiple data sources and ensuring seamless data flow across systems
Ensuring data quality, consistency, and governance (data lineage, access control)
Optimizing storage and processing performance using modern data formats and partitioning strategies
Monitoring, troubleshooting, and improving data pipeline performance
Collaborating with stakeholders to translate business needs into technical solutions
Following best practices in data engineering and continuously improving the platform
Requirements
Strong experience in Data Engineering or Big Data-related roles
Proficiency in Python, Scala, or Java
Hands-on experience with tools such as Apache Spark, PySpark, or similar frameworks
Previous work with Data Lake technologies (e.g., AWS S3, Azure Data Lake, Databricks, BigQuery)
Knowledge of ETL/ELT processes and orchestration tools (e.g., Airflow, Data Factory)
Good understanding of SQL and data modeling
Experience with distributed systems and large-scale data processing
Familiarity with Docker and Kubernetes
Strong analytical and problem-solving skills
Fluent in Polish required
Residing in Poland required
Nice-to-have requirements
Experience with streaming technologies (e.g., Kafka)
Knowledge of data governance tools
Familiarity with CI/CD processes in data projects
Data Engineer – Data Lake (f/m/x)
Data Engineer – Data Lake (f/m/x)