Data Engineer

Data

Data Engineer

Data
Centrum, Warsaw

emagine Polska

Full-time
Any
Senior
Remote

Job description

Commodity CCW

Job Title Technology - Data Science - Senior

What we offer:

  • Long Term B2B Contract

  • Full Remote

  • 42 euro/h +VAT

Role Summary

We are seeking a highly skilled Data Engineer to join as a full-time consultant. You will combine clinical data expertise with strong data engineering and technical skills to generate well-documented pipelines from source to curated datasets in common data models like CDISC SDTM. You will collaborate closely with clinical SMEs, data scientists, infrastructure, and other skilled data engineers.

We are looking to expand this functionality to include Real World Data (from a broad range of registries). You will help extend our medallion Databricks pipelines (CDISC SDTM) to incorporate Real-World Data (RWD) from registries and other sources, working with clinical experts and AI teams to combine rule-based and automated mapping approaches (including OMOP interoperability).

Main Responsibilities

  • Design, build and maintain production ETL pipelines in Databricks/Delta Lake to ingest RWD (registries, claims, EHR extracts) and transform into standard models.

  • Implement harmonisation workflows to map incoming RWD to OMOP and to the internal CDISC SDTM canonical model; handle vocabulary mapping, units normalization and provenance.

  • Extend the medallion architecture (bronze/silver/gold) patterns with robust validation, lineage, partitioning and performance tuning.

  • Develop configurable, input-driven transformation frameworks so clinical experts can drive mapping rules via config files and catalogs.

  • Integrate AI/automation components (e.g., model-assisted mapping, NLP for free text) with human-in-the-loop review and confidence scoring.

  • Establish testing, CI/CD, monitoring and alerting for ETL jobs and automations; ensure reproducibility, versioning and governance.

  • Collaborate with clinical data scientists, data stewards and stakeholders to define requirements, data contracts and success metrics.

Key Requirements

  • Proven experience designing and implementing ETL pipelines in Databricks/Spark and Delta Lake.

  • Strong knowledge of OMOP CDM and experience mapping datasets to OMOP; familiarity with CDISC SDTM is a plus.

  • Expertise in data modelling, partitioning, performance tuning, and best practices for large clinical/RWD datasets.

  • Experience with vocabulary services and terminology mapping (OHDSI/Athena, UMLS, or similar).

  • Experience integrating AI/NLP components into data pipelines (entity extraction, mapping suggestions) is desirable.

  • Familiarity with testing frameworks for data (Great Expectations, Deequ), CI/CD, infrastructure as code, and orchestration tools (Databricks Jobs, Airflow).

  • Good communication skills and experience working with domain experts to capture requirements.

Nice to Have

  • Prior experience in pharma or clinical research environments.

  • Knowledge of data governance, privacy regulations and secure handling of patient data.

  • Experience with Unity Catalog, Databricks Delta Sharing, and cloud infrastructure (Azure/AWS).

Other Details

This position is for a full-time consultant role, focusing on developing advanced data solutions in the field of clinical research and Real World Data integration. The ideal candidate will have the opportunity to work remotely while collaborating with various stakeholders across the organization. Duration and workload are negotiable based on project needs.

Tech stack

    English

    B1

    Microsoft Azure

    advanced

    Cloud

    advanced

    CI/CD

    advanced

    ETL

    advanced

    Testing

    advanced

    Spark

    advanced

    Governance

    advanced

    Artificial Intelligence (AI)

    advanced

    Responsive Web Design (RWD)

    advanced

    DataStage (ETL)

    advanced

Office location

Data Engineer

Summary of the offer

Data Engineer

Centrum, Warsaw
emagine Polska
By applying, I consent to the processing of my personal data for the purpose of conducting the recruitment process. Informujemy, że administratorem danych jest emagine z siedzibą w Warszawie, ul.Domaniewskiej 39A (dalej jako "administrator"). Masz pr... MoreThis site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.