We are looking for a talented Senior Data Engineer with a strong background in developing or contributing to applications based on microservices using a Kappa architecture. The project aims to unify data sourced from different EHR systems in the healthcare domain, using the FHIR data format.
Our client is a leading analytics company operating at the intersection of technology, artificial intelligence, and big data. They support manufacturers and retailers in the fast-moving consumer goods sector, helping them better understand market dynamics, uncover consumer behavior insights, and make data-driven business decisions.
The project aims to unify data sourced from various EHR systems in the healthcare domain using the FHIR data format. The company’s proprietary technology platform combines high-quality data, deep industry expertise, and advanced predictive algorithms built over decades of experience in the field.
- Data Standardization and Transformation:
- Convert diverse data structures from various EHR systems into a unified format based on FHIR standards
- Map and normalize incoming data to the FHIR data model, ensuring consistency and completeness
- Kafka Integration:
- Consume and process events from the Kafka stream produced by the Data Writer Module
- Deserialize and validate incoming data to ensure adherence to required standards
- Data Segmentation:
- Separate data streams for warehousing and AI model training, applying specific preprocessing steps for each purpose
- Prepare and validate data for storage and machine learning model training
- Error Handling and Logging:
- Implement robust error handling mechanisms to track and resolve data mapping issues
- Maintain detailed logs for auditing and troubleshooting purposes
- Data Ingestion and Processing:
- Use LLMs to extract structured data from EHRs, research articles, and clinical notes
- Ensure semantic consistency and interoperability during data ingestion
- Knowledge Graph Construction:
- Integrate extracted data into a knowledge graph, representing entities and relationships for semantic data integration
- Implement contextual understanding and querying of complex relationships within the knowledge graph (KG)
- Advanced Predictive Modeling:
- Leverage KGs and LLMs to enhance data interoperability and predictive analytics
- Develop frameworks for contextualized insights and personalized medicine recommendations
- Feedback Loop:
- Continuously update the knowledge graph with new data using LLMs, ensuring up-to-date and relevant insights
- Work Closely with Cross-Functional Teams
- Collaborate with data scientists, AI specialists, and software engineers to design and implement data processing solutions
- Communicate effectively with stakeholders to align on goals and deliverables
- Contribute to Engineering Culture:
- Foster a culture of innovation, collaboration, and continuous improvement within the engineering team
- Deep understanding of patterns and software development practices for event-driven architectures
- Hands-on experience with stateful stream data processing solutions (Kafka or similar streaming platforms)
- Strong knowledge of data serialization/deserialization using various data formats (at minimum JSON and Avro), and integration with schema registries
- Proven Python software development expertise, with experience in data processing and integration (most of the software is written in Python)
- Practical experience building end-to-end solutions with Apache Flink or a similar platform
- Experience with containerization and orchestration using Kubernetes (K8s) and Helm, especially on Google Kubernetes Engine (GKE)
- Familiarity with Google Cloud Platform (GCP) or a similar cloud platform
- Hands-on experience implementing data quality solutions for schema-on-read or schema-less data
- Hands-on experience integrating with Apache Kafka, particularly the Confluent Platform
- Familiarity with AI and ML frameworks
- Proficiency in SQL and experience with both relational and NoSQL databases
- Experience with graph databases like Neo4j or RDF-based systems
- Experience in the healthcare domain and familiarity with healthcare standards such as FHIR and HL7 for data interoperability
WOULD BE A PLUS:
- Experience with web data scraping
- Strong problem-solving skills, with the ability to design innovative solutions for complex data integration and processing challenges
- Excellent communication skills, with the ability to articulate complex technical concepts and work effectively with various stakeholders
- Commitment to improving healthcare through data-driven solutions and technology
- Stay abreast of the latest technologies and industry trends while continually improving your skills and knowledge
- Ability to work in a collaborative environment, valuing diverse perspectives and contributing to a positive team culture