Hello!
We are looking for a Data Engineer for our client (healthcare).
Responsibilities:
- Collaborate with product managers, data scientists, data analysts, and engineers to define requirements and data specifications.
- Develop, deploy and maintain data processing pipelines using cloud technology such as AWS, Kubernetes, Airflow, Redshift, EMR.
- Develop, deploy, and maintain serverless data pipelines using Event Bridge, Kinesis, AWS Lambda, S3, and Glue.
- Define and manage the overall schedule and availability for a variety of data sets.
- Work closely with other engineers to enhance infrastructure, improve reliability and efficiency.
- Make smart engineering and product decisions based on data analysis and collaboration.
- Act as in-house data expert and make recommendations regarding standards for code quality and timeliness.
- Architect cloud-based data infrastructure solutions to meet stakeholder needs.
Skills & Qualifications:
- Bachelor’s degree in analytics, statistics, engineering, math, economics, computer science, information technology or related discipline.
- 5+ years professional experience in the big data space.
- 5+ years' experience in engineering data pipelines using big data technologies (Spark, Flink etc...) on large scale data sets.
- Expert knowledge in writing complex SQL and ETL development with experience processing extremely large datasets.
- Expert in applying SCD types on S3 data lake using Delta Lake/Hudi.
- Demonstrated ability to analyze large data sets to identify gaps and inconsistencies, provide data insights, and advance effective product solutions.
- Deep familiarity with AWS Services (S3, Event Bridge, Glue, EMR, Redshift, Lambda)
- Ability to quickly learn complex domains and new technologies
- Innately curious and organized with the drive to analyze data to identify deliverables, anomalies and gaps and propose solutions to address these findings
- Thrives in fast-paced startup environment
Good To Have:
- Experience with customer data platform tools such as Segment.
- Experience using Jira, GitHub, Docker, CodeFresh, Terraform.
- Experience contributing to full lifecycle deployments with a focus on testing and quality.
- Experience with data quality processes, data quality checks, validations, data quality metrics definition and measurement.