Autopay Global is the newest member of the Autopay family, aiming to expand the reach of the group’s state-of-the-art payment integration and payment data technologies to the international market, providing seamless integration with local PSPs, support for multiple currencies and compliance with local frameworks. We have a very forward-looking approach to our products, we value creativity, passion and drive to leverage the newest achievements in technology to our advantage.
To support our dynamic expansion, we are looking for a new Head of Data Engineering for a full-time, hybrid work in Warsaw or Gdańsk.
The Head of Data Engineering owns the end-to-end data architecture and execution, with hands-on depth in PySpark and Databricks and strong experience building AI-ready data foundations on Google Cloud Storage (GCS) and Google Vertex AI.
You will be responsible for delivering a secure, scalable, and low-latency data lakehouse and feature platform that enables Autopay’s AI core (agents, RAG, ranking, decisioning) and activation systems to run on reliable, high-quality, well-governed data across batch and streaming. You will also be responsible for hiring, leading and mentoring a team of high-performing data engineering proffessionals.
- Define the lakehouse reference architecture on GCS with Databricks/Delta Lake,
- build and operate PySpark pipelines in Databricks for both streaming and batch workloads,
- implement streaming ingestion,
- own the Customer 360 / CDP layer: unify events, transactions, and user identifiers,
- deliver a real-time feature layer (feature store) that publishes segments, scores, and vectors,
- create and maintain embeddings and retrieval indexes to power RAG in Autopay AI Core (chunking strategies, metadata, refresh policies, and retrieval evaluation,
- establish data governance with Dataplex/Data Catalog and/or Unity Catalog,
- own data observability for pipelines: freshness, completeness, schema drift, anomaly detection, and automated remediation workflows.
- Technology: PySpark, Databricks, Google Cloud Storage, Google Vertex AI, Delta Lake
- Nice to have: Experience with identity resolution inputs, experience building near-real-time segmentation, CLV, and propensity scoring pipelines, familiarity with vector databases and multi-cloud data movement patterns.
- 10+ years in data engineering and still hands-on to build ground up forming a team; 3-5+ years leading data platform teams with ownership of production data SLAs,
- deep hands-on expertise with PySpark and Spark performance tuning (shuffle optimization, partitioning, checkpointing, incremental loads),
- strong experience with Databricks (jobs/workflows, Delta Lake, governance) and building lakehouse architectures on GCS,
- proven delivery of streaming + batch data platforms that power real-time product experiences (not just analytics),
- experience building feature stores and ML-ready datasets with point-in-time correctness and strong governance,
- strong grasp of privacy and compliance in data systems: PII handling, consent, and auditability,
- Google Vertex AI experience: building data pipelines that feed training, evaluation, and inference workflows; understanding of dataset/version management,
- hands-on experience supporting RAG systems: document ingestion, chunking, embedding generation, retrieval evaluation, and index refresh strategies,
- experience with retrieval-aware training approaches (e.g., retrieval augmented fine-tuning / RAFT) and producing high-quality supervised datasets with provenance,
- ability to collaborate with AI Engineers on MCP-based tools and agent workflows (tool schemas, rate limits, caching, and audit logs).
- a lidership role in fast-growing, global fintech company,
- possibility to work with cutting-edge tools and technologies,
- independence in decision-making,
- friendly working environment, team support, no dress code.