Lead Data Engineer

Data

Krakow, Poland (Remote)

Svitla Systems

Full-time

B2B

Senior

Remote

Job description

Svitla Systems Inc. is looking for a Lead Data Engineer for a full-time remote position (40 hours per week) in Europe. Our client is a growth-stage SaaS company reimagining how members engage and learn within modern communities of practice. The platform fosters member engagement with both content and human connections, aiming to create vibrant community experiences that deliver greater value and support organizational growth.

You’ll lead the data infrastructure migration and build production of ML/AI pipelines. This role will establish data engineering best practices for the organization while delivering on critical 2026 initiatives (50% data pipeline migration, with heavy emphasis in the early months).

The Product and Engineering teams report up through the CTO. It is an entirely virtual team spread across four time zones in the mainland US. Most of the team works Central Time business hours. They have a culture of deep collaboration across all teams. Product, Engineering, Infrastructure, and Client Services work closely every single day. The engineering team practices Test Driven Development with an emphasis on pair programming. The engineers rotate regularly between pairs, ensuring a broad understanding of the platform and its features. The team uses a CI/CD system that enables frequent deployments to production. They deploy daily but release features behind a feature flag as part of their product release cycle.

Current Data Infrastructure:

Data sources: Segment (event data), production Postgres database;
Processing: AWS Glue (ETL jobs, Data Catalog), Spark, PySpark, Pandas, Python, Jupyter Notebooks;
Compute: Lambda functions for orchestration and transformation;
Storage: Parquet files, S3;
Analytics: Athena (current), Redshift (target state);
Orchestration: EventBridge for scheduling;
Observability: CloudWatch for logging and monitoring.

ML/AI Stack:

Model training: Custom VMs with Linux-based Docker containers;
Model deployment: Docker containers deployed to AWS SageMaker endpoints for inference;
AWS Step Functions for pipeline orchestration;
CI/CD through GitHub Actions;
Integration with Ruby on Rails production applications.

Requirements:

5+ years of hands-on experience in data engineering.
Background in analytics or business intelligence environments.
Advanced understanding of SQL for complex queries, optimization, and performance tuning.
Strong knowledge of Python with experience with Pandas and PySpark for data transformation.
Production experience with Apache Spark for large-scale data processing.
Deep understanding of AWS Glue (ETL, Crawlers, Data Catalog).
Experience designing and implementing Lambda functions in data pipelines.
Working knowledge of Parquet and columnar data formats.
Production experience with Redshift, Snowflake, or Databricks.
Familiarity with event streaming platforms (Segment or similar).
Understanding of data quality and testing frameworks.
Experience working with data scientists to productionize models.
Expertise in architecting data warehouse solutions and optimizing for cost and performance.
Experience migrating workloads between data platforms.
Experience with CI/CD pipelines, specifically GitHub Actions.
Proven track record in building production ML inference pipelines.
Understanding how to deploy Dockerized models to production environments.
Hands-on experience with AWS SageMaker endpoints for model serving.
Experience with AWS Step Functions for workflow orchestration.
Knowledge of Docker for containerizing applications and working with containerized models.
Understanding of MLOps principles and best practices.
Knowledge of Terraform and an infrastructure-as-code mindset.
Knowledge of EventBridge, CloudWatch, and AWS monitoring tools.
Expertise in writing clean, maintainable, production-quality code.
Self-directed problem solver who can identify issues and drive to resolution.
Advanced level of English.

Nice to have:

Experience with building ML pipelines.
Experience with MLOps.

Responsibilities:

Lead migration of production data pipeline from legacy US-West infrastructure to new US-East-1 AWS environment. +
Migrate and redesign approximately 80 ETL processes from Domo to Redshift, including a complete architectural redesign (not lift-and-shift).
Convert 40+ regularly scheduled Athena queries and 6 Jupyter Notebooks to Redshift.
Rebuild data views and optimize query performance in the new environment.
Ensure data integrity throughout migration with comprehensive testing and validation.
Own the technical execution of the 6-month pipeline modernization project (50% - increasing emphasis through 2026).
Work independently with minimal technical oversight.
Establish data engineering patterns and best practices for the organization.
Make architectural decisions and recommend solutions.
Provide reporting to the VP of Data & Analytics.
Close partnership with the Director of Infrastructure on AWS architecture and deployment.
Regularly partner with data scientists to operationalize models, owning the data input and inference deployment sides of the pipeline (but not designing or training models).
Participate in coordinating with DevOps engineers on infrastructure and deployment.
Take part in integration work with the Ruby on Rails engineering team on production endpoints.
Build monitoring and observability into ML pipelines using CloudWatch.
Establish MLOps best practices using SageMaker endpoints, Step Functions, and related AWS tools.
Implement scalable, reliable data flows that feed ML models and deliver predictions to production systems.

We offer:

US and EU projects based on advanced technologies.
Competitive compensation based on skills and experience.
Regular performance appraisals to support your growth.
Flexibility in workspace, either remote or our welcoming office.
Bonuses for article writing, public talks, and other activities.
Generous time off, including vacation, national holidays, sick leaves, and family days.
Personalized learning programs tailored to your interests and skill development.
Free tech webinars and meetups organized by Svitla.
Regular corporate online activities.
Awesome team and a friendly, supportive community!

Tech stack

English

Data

master

AWS

advanced

PySpark

advanced

Pandas

advanced

Python

advanced

AWS Glue

regular

Snowflake

regular

Databricks

regular

Apache Spark

regular

GitHub Actions

junior

Office location

Published: 01.02.2026

About the company

Svitla Systems

Svitla Systems is a global digital solutions company with over 20 years of industry experience, presence across 15 countries, and a team of 1,000+ skilled tech experts, creators, and visionaries. We empower businesses ac...

Company profile

Check similar offers