Data Engineer

Data

Data Engineer

Data
Centrum, Warsaw

emagine Polska

Full-time
Any
Senior
Remote

Job description

The main purpose of this role is to combine clinical data expertise with strong data engineering skills to create and manage data pipelines, ensuring the integrity and accessibility of clinical trial data.

Responsibilities:

  • Productionize and monitor pipelines and models; collaborate on CI/CD, testing, and user feedback.

  • Implement ETL patterns (medallion architecture), ensuring data provenance, validation, and versioning.

  • Participate in the continuous improvement and validation of existing pipelines.

  • Ensure clinical concepts are accurately represented and harmonized across various data models (CDISC SDTM/ADaM, OMOP, HL7); contribute to mapping and transformation logic.

  • Develop NLP models for entity and relation extraction (e.g., inclusion/exclusion criteria, demographics, endpoints, study design).

  • Build automated pipelines to ingest registry and publication data, converting it to tabular, queryable datasets.

  • Co-design the benchmarking data model with end users and map extracted information to standardized terminologies.

  • Integrate human-in-the-loop review, confidence scoring, and vocabulary/units normalization.

Key Requirements

  • Strong data engineering skills: Databricks, Spark, Delta Lake, SQL, ETL design and orchestration.

  • Familiarity with clinical trial concepts (inclusion/exclusion criteria, endpoints, demographics) and biomedical terminologies.

  • Practical experience with data modeling and working with end users to define requirements.

  • Experience with CI/CD, testing frameworks, and monitoring for data pipelines and ML models.

  • Experience with NLP for information extraction from scientific text (publications, registries).

  • Good communication and collaboration skills.

Nice to Have

  • Experience with OMOP / CDISC or other clinical data standards.

  • Familiarity with vocabulary services (OHDSI/Athena, UMLS) and NLP toolkits (spaCy, Hugging Face, AllenNLP).

  • Prior work in pharma or clinical research data environments.

Tech stack

    English

    B1

    automation

    advanced

    Machine Learning (ML)

    advanced

    DataStage (ETL)

    advanced

    SQL

    advanced

    Testing

    advanced

    Spark

    advanced

    Data modeling

    advanced

    Data Integration

    advanced

    ETL

    advanced

    CI/CD

    advanced

Office location

Data Engineer

Summary of the offer

Data Engineer

Centrum, Warsaw
emagine Polska
By applying, I consent to the processing of my personal data for the purpose of conducting the recruitment process. Informujemy, że administratorem danych jest emagine z siedzibą w Warszawie, ul.Domaniewskiej 39A (dalej jako "administrator"). Masz pr... MoreThis site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.