Implement and manage data ingestion pipelines from diverse sources such as Kafka, RDBMS (Postgres) using CDC (Change Data Capture), and file systems (CSV) following Medalion Architecture principles
Develop and optimize data transformations using PySpark and SQL to handle data ranging from MB to GB, depending on the source
Conduct unit testing and integration testing to ensure the accuracy and reliability of data transformations and pipelines
Work with AWS technologies, including S3 for data storage and Docker on AWS for containerized applications
Implement and manage infrastructure using Terraform, such as creating S3 buckets, managing Databricks Service Principals, and deploying infrastructure as code
Deploy and manage solutions using CI/CD pipelines, particularly with CircleCI, to ensure seamless and automated deployment processes
Requirements:
Minimum 4-5 years of a professional experience
Proficiency in SQL and Python
Strong experience with AWS cloud services
Hands-on experience with DataBricks
Knowledge of ETL Processing
Effective communication skills in English (minimum B2 level)
Knowledge of system design
Understanding of Medalion Architecture
Nice to have:
Familiarity with Kedro and Airbyte
Knowledge of Machine Learning
Offer:
Private medical care
Co-financing for the sport card
Training & learning opportunities
Constant support of dedicated consultant
Team-building events organized by DCG
Employee referral program
Check similar offers
Principal BI Engineer
New
Cornerstone OnDemand
3.63K - 5.81K USD
Wrocław
, Fully remote
Fully remote
Tableau
SQL
Snowflake
Cloud Data Architect (AWS)
New
Future Processing
6K - 8.91K USD
Poznań
, Fully remote
Fully remote
CI/CD
Terraform
Delta Lake
Quantitative Engineer
New
goLance
6K - 9.2K USD
Gdańsk
, Fully remote
Fully remote
Machine Learning
Rust
Python
Senior Data Engineer z j. angielskim lub niemieckim (People and Project Analytics)