Senior Data Engineer
About Webellian
Webellian is a well-established Digital Transformation and IT consulting company committed to creating a positive impact for our clients. We strive to make a meaningful difference in diverse sectors such as insurance, banking, healthcare, retail, and manufacturing. Our passion for cutting-edge and disruptive technologies, as well as our shared values and strong principles, are what motivate us. We are a community of engineers and senior advisors who work with our clients across industries, playing a deep and meaningful role in accelerating and realizing their vision and strategy.
About the position
As a Senior Data Engineer within Advanced Analytics, you will design, build, and operate the data infrastructure that powers AI-enabled solutions at global scale. You will own the full data pipeline lifecycle — from ingestion and transformation through to serving data reliably to ML models, AI agents, and analytical consumers. You will work at the intersection of PostgreSQL-based operational data systems and Databricks-based lakehouse pipelines, ensuring data flows are robust, well-governed, and optimised for AI workloads.
Key responsibilities:
Design and build scalable data pipelines for ingestion, transformation, and serving of structured and unstructured data — supporting both batch and real-time AI workloads.
Develop and maintain Databricks-based data processing workflows: Delta Lake table management, PySpark transformations, notebook orchestration, and Unity Catalog governance.
Architect and optimise PostgreSQL data models: schema design, indexing strategies, partitioning, query performance tuning, and integration patterns for AI service consumption.
Build and maintain data orchestration workflows using Apache Airflow, Databricks Workflows, or equivalent — ensuring reliable scheduling, dependency management, and failure recovery.
Implement data quality frameworks: validation rules, anomaly detection, data contracts, and automated alerting on pipeline health and data freshness.
Design and manage feature engineering pipelines: transforming raw data into ML-ready feature sets, integrating with feature stores, and versioning feature definitions.
Own data integration patterns between operational PostgreSQL databases and the Databricks lakehouse: CDC (Change Data Capture), event-driven ingestion via Kafka, and batch export strategies.
Implement data governance standards: lineage tracking, cataloguing, access control, PII handling, data retention policies, and audit logging.
Collaborate with ML Engineers to design and deliver data pipelines supporting model training, batch inference, and real-time feature serving.
Monitor and operate data infrastructure: pipeline observability dashboards, SLA tracking, incident response, and root-cause analysis for data issues.
Champion Claude Code as an active daily tool for pipeline development, SQL generation, data exploration scripting, and documentation.
Required Experience & Skills
6+ years of professional data engineering experience, with a strong track record of delivering production data pipelines at scale.
Expert-level SQL and strong PostgreSQL expertise: advanced query optimisation, schema design, indexing, partitioning, and understanding of MVCC and connection management.
Strong Databricks experience: Delta Lake, PySpark, Databricks Workflows, Unity Catalog, and performance tuning of large-scale Spark jobs.
Proficiency in Python for data pipeline development: pandas, PySpark, data validation libraries (Great Expectations or equivalent), and scripting for automation.
Experience with data orchestration frameworks: Apache Airflow, Databricks Workflows, or equivalent DAG-based scheduling tools.
Solid understanding of data integration patterns: CDC with Debezium or equivalent, Kafka-based event streaming, and batch ingestion strategies.
Hands-on experience with data lakehouse architecture: medallion architecture (Bronze/Silver/Gold), Delta Lake ACID transactions, and table optimisation.
Experience implementing data quality frameworks and data contracts in production pipelines.
Familiarity with Azure data services: Azure Data Factory, Azure Event Hubs, Azure Data Lake Storage, or equivalent cloud-native data tooling.
Hands-on proficiency with Claude Code: using it daily for pipeline development, SQL authoring, data exploration, and documentation tasks.
Strong communication skills: able to collaborate with data consumers (ML Engineers, analysts, product teams) to understand requirements and deliver reliable data products.
What we offer
Contract under Polish law: B2B or Umowa o Pracę
Benefits such as private medical care, group insurance, Multisport card
English classes available
Hybrid work (at least 1 day/week on-site) in Warsaw (Mokotów)
Opportunity to work with excellent professionals
High standards of work and focus on the quality of code
New technologies in use
Continuously learning and growth
International team
Pinball, PlayStation & much more (on-site)
Join a growing team of dedicated professionals! We love to pass on the knowledge to grow excellence, speak our minds without playing politics, and just enjoy hanging around together. If you share our passions - we want to meet you! So go ahead and apply ➡️
Senior Data Engineer
Senior Data Engineer