Data Engineer

Data

Centrum, Warsaw

emagine Polska

Full-time

Any

Senior

Remote

Job description

The main purpose of this role is to combine clinical data expertise with strong data engineering skills to create and manage data pipelines, ensuring the integrity and accessibility of clinical trial data.

Responsibilities:

Productionize and monitor pipelines and models; collaborate on CI/CD, testing, and user feedback.
Implement ETL patterns (medallion architecture), ensuring data provenance, validation, and versioning.
Participate in the continuous improvement and validation of existing pipelines.
Ensure clinical concepts are accurately represented and harmonized across various data models (CDISC SDTM/ADaM, OMOP, HL7); contribute to mapping and transformation logic.
Develop NLP models for entity and relation extraction (e.g., inclusion/exclusion criteria, demographics, endpoints, study design).
Build automated pipelines to ingest registry and publication data, converting it to tabular, queryable datasets.
Co-design the benchmarking data model with end users and map extracted information to standardized terminologies.
Integrate human-in-the-loop review, confidence scoring, and vocabulary/units normalization.

Key Requirements

Strong data engineering skills: Databricks, Spark, Delta Lake, SQL, ETL design and orchestration.
Familiarity with clinical trial concepts (inclusion/exclusion criteria, endpoints, demographics) and biomedical terminologies.
Practical experience with data modeling and working with end users to define requirements.
Experience with CI/CD, testing frameworks, and monitoring for data pipelines and ML models.
Experience with NLP for information extraction from scientific text (publications, registries).
Good communication and collaboration skills.

Nice to Have

Experience with OMOP / CDISC or other clinical data standards.
Familiarity with vocabulary services (OHDSI/Athena, UMLS) and NLP toolkits (spaCy, Hugging Face, AllenNLP).
Prior work in pharma or clinical research data environments.

Tech stack

English

automation

advanced

Machine Learning (ML)

advanced

DataStage (ETL)

advanced

SQL

advanced

Testing

advanced

Spark

advanced

Data modeling

advanced

Data Integration

advanced

ETL

advanced

CI/CD

advanced

Office location

Data Engineer

Summary of the offer

Data Engineer

Centrum, Warsaw

emagine Polska

By applying, I consent to the processing of my personal data for the purpose of conducting the recruitment process. Informujemy, że administratorem danych jest emagine z siedzibą w Warszawie, ul.Domaniewskiej 39A (dalej jako "administrator"). Masz pr... MoreThis site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Check similar offers