We are looking for a talented Senior Data Engineer with a strong background in developing or contributing to applications based on microservices using a Kappa architecture. The project aims to unify data sourced from different EHR systems in the healthcare domain, using the FHIR data format.
Our client is a leading analytics company operating at the intersection of technology, artificial intelligence, and big data. They support manufacturers and retailers in the fast-moving consumer goods sector, helping them better understand market dynamics, uncover consumer behavior insights, and make data-driven business decisions.
The project aims to unify data sourced from various EHR systems in the healthcare domain using the FHIR data format. The company’s proprietary technology platform combines high-quality data, deep industry expertise, and advanced predictive algorithms built over decades of experience in the field.
- Data Standardization and Transformation:
 
- Convert diverse data structures from various EHR systems into a unified format based on FHIR standards
 
- Map and normalize incoming data to the FHIR data model, ensuring consistency and completeness
 
- Kafka Integration:
 
- Consume and process events from the Kafka stream produced by the Data Writer Module
 
- Deserialize and validate incoming data to ensure adherence to required standards
 
- Data Segmentation:
 
- Separate data streams for warehousing and AI model training, applying specific preprocessing steps for each purpose
 
- Prepare and validate data for storage and machine learning model training
 
- Error Handling and Logging:
 
- Implement robust error handling mechanisms to track and resolve data mapping issues
 
- Maintain detailed logs for auditing and troubleshooting purposes
 
- Data Ingestion and Processing:
 
- Use LLMs to extract structured data from EHRs, research articles, and clinical notes
 
- Ensure semantic consistency and interoperability during data ingestion
 
- Knowledge Graph Construction:
 
- Integrate extracted data into a knowledge graph, representing entities and relationships for semantic data integration
 
- Implement contextual understanding and querying of complex relationships within the knowledge graph (KG)
 
- Advanced Predictive Modeling:
 
- Leverage KGs and LLMs to enhance data interoperability and predictive analytics
 
- Develop frameworks for contextualized insights and personalized medicine recommendations
 
- Feedback Loop:
 
- Continuously update the knowledge graph with new data using LLMs, ensuring up-to-date and relevant insights
 
- Work Closely with Cross-Functional Teams
 
- Collaborate with data scientists, AI specialists, and software engineers to design and implement data processing solutions
 
- Communicate effectively with stakeholders to align on goals and deliverables
 
- Contribute to Engineering Culture:
 
- Foster a culture of innovation, collaboration, and continuous improvement within the engineering team
 
- Deep understanding of patterns and software development practices for event-driven architectures
 
- Hands-on experience with stateful stream data processing solutions (Kafka or similar streaming platforms)
 
- Strong knowledge of data serialization/deserialization using various data formats (at minimum JSON and Avro), and integration with schema registries
 
- Proven Python software development expertise, with experience in data processing and integration (most of the software is written in Python)
 
- Practical experience building end-to-end solutions with Apache Flink or a similar platform
 
- Experience with containerization and orchestration using Kubernetes (K8s) and Helm, especially on Google Kubernetes Engine (GKE)
 
- Familiarity with Google Cloud Platform (GCP) or a similar cloud platform
 
- Hands-on experience implementing data quality solutions for schema-on-read or schema-less data
 
- Hands-on experience integrating with Apache Kafka, particularly the Confluent Platform
 
- Familiarity with AI and ML frameworks
 
- Proficiency in SQL and experience with both relational and NoSQL databases
 
- Experience with graph databases like Neo4j or RDF-based systems
 
- Experience in the healthcare domain and familiarity with healthcare standards such as FHIR and HL7 for data interoperability
 
WOULD BE A PLUS:
- Experience with web data scraping
 
- Strong problem-solving skills, with the ability to design innovative solutions for complex data integration and processing challenges
 
- Excellent communication skills, with the ability to articulate complex technical concepts and work effectively with various stakeholders
 
- Commitment to improving healthcare through data-driven solutions and technology
 
- Stay abreast of the latest technologies and industry trends while continually improving your skills and knowledge
 
- Ability to work in a collaborative environment, valuing diverse perspectives and contributing to a positive team culture