Autopay Global is the newest member of the Autopay family, aiming to expand the reach of the group’s state-of-the-art payment integration and payment data technologies to the international market, providing seamless integration with local PSPs, support for multiple currencies and compliance with local frameworks. We have a very forward-looking approach to our products, we value creativity, passion and drive to leverage the newest achievements in technology to our advantage.
To support our dynamic expansion, we are looking for a new Data Engineer for a full-time, hybrid work.
The Data Engineer is the backbone of Autopay’s data platform, responsible for building and maintaining the ingestion, transformation, and serving pipelines that power personalization, payments, and AI workloads. Working hands-on with PySpark, Databricks, and GCP, this role ensures data flows reliably from source systems through the lakehouse to downstream consumers — both human analysts and AI agents.
This is a hands-on position, with the following key responsibilities:
- Build and maintain PySpark pipelines in Databricks for streaming and batch workloads,
- implement and manage data ingestion from diverse sources using APIs, change data capture, event streaming, and webhooks,
- design and enforce the Bronze/Silver/Gold lakehouse layers on GCS with Delta Lake,
- build and maintain Customer 360 / CDP datasets,
- develop and operate feature store pipelines,
- implement schema evolution with schema registry, data contracts, and automated compatibility checks across streaming and batch ingestion,
- own pipeline observability: set up monitoring for freshness, completeness, schema drift, and anomaly detection,
- collaborate with AI Engineers on embedding and feature data pipelines,
- contribute to data governance: lineage tracking, PII tagging, access reviews, and quality SLA enforcement,
- participate in on-call rotations and incident response for production data systems.
- Technology: PySpark, Databricks, Google Cloud Storage, BigQuery, Kafka, Pub/Sub
- Nice to have: experience with feature store systems for ML/AI, Delta Live Tables, Databricks, Asset Bundles, multi-cloud data movement
If any technical terms are unfamiliar to you, don’t worry. We’ll be happy to help and teach you everything you need.
- 5+ years in data engineering with production pipeline ownership; strong software engineering discipline,
- hands-on proficiency with PySpark and Spark performance tuning (shuffle optimization, partitioning, checkpointing, incremental loads),
- experience with Databricks (jobs/workflows, Delta Lake, governance) or equivalent lakehouse platforms,
- experience building both streaming and batch pipelines that power real-time product features (not only analytics/BI),
- understanding of data modeling (star schema, SCD, Bronze/Silver/Gold) and schema evolution strategies,
- experience with at least one CDC tool (Datastream, Debezium, DMS) and event streaming (Kafka or Pub/Sub),
- familiarity with data governance concepts: lineage, data contracts, PII handling, and consent enforcement,
- growth mindset and ability to apply new technologies, patterns or solutions to the current workstream (i.e. GenAI-assisted coding)
- communication and interpersonal skills.
- being a part of a fast-growing, global fintech company,
- possibility to work with cutting-edge tools and technologies,
- independence in decision-making,
- friendly working environment, team support, no dress code.