Implement and manage data ingestion pipelines from diverse sources such as Kafka, RDBMS (Postgres) using CDC (Change Data Capture), and file systems (CSV) following Medalion Architecture principles
Develop and optimize data transformations using PySpark and SQL to handle data ranging from MB to GB, depending on the source
Conduct unit testing and integration testing to ensure the accuracy and reliability of data transformations and pipelines
Work with AWS technologies, including S3 for data storage and Docker on AWS for containerized applications
Implement and manage infrastructure using Terraform, such as creating S3 buckets, managing Databricks Service Principals, and deploying infrastructure as code
Deploy and manage solutions using CI/CD pipelines, particularly with CircleCI, to ensure seamless and automated deployment processes
Requirements:
Minimum 4-5 years of a professional experience
Proficiency in SQL and Python
Strong experience with AWS cloud services
Hands-on experience with DataBricks
Knowledge of ETL Processing
Effective communication skills in English (minimum B2 level)
Knowledge of system design
Understanding of Medalion Architecture
Nice to have:
Familiarity with Kedro and Airbyte
Knowledge of Machine Learning
Offer:
Private medical care
Co-financing for the sport card
Training & learning opportunities
Constant support of dedicated consultant
Team-building events organized by DCG
Employee referral program
Check similar offers
Senior Data Scientist
New
EER POLAND
6.5K - 7.5K USD
Wrocław
, Fully remote
Fully remote
Machine Learning
Python
Amazon AWS
Senior Data Engineer (100% remote)
New
Crestt
4.12K - 5.81K USD
Poznań
, Fully remote
Fully remote
ETL
Power BI
SQL
DataOps Engineer
New
7N
6.51K - 7.73K USD
Gdańsk
, Fully remote
Fully remote
Airflow
Databricks
Terraform
Senior ETL/Database Integration Developer
New
DevsData LLC
2.5K - 3K USD
Wrocław
, Fully remote
Fully remote
SQL
Laravel
PHP
Senior Data Engineer z j. angielskim lub niemieckim (People and Project Analytics)