Senior Data Engineer
Warsaw, Poland (Remote)
Autoplay
Autoplay.ai is a fast-growing US startup that automatically turns user session replays and analytics events into clear, AI-powered insights (user intent, friction points, summaries, highlights). We’re currently scaling the team and actively looking for a
Senior Data Engineer
(Poland, remote)
💰 Salary: 27 000-36 000 PLN/month
🌎 Fully remote
🕦 Full-time position
☑️ B2B
What you’ll be working on:
Data Pipeline Architecture & Management
Design, orchestrate, and maintain our multi-source data pipeline (RRWeb events, analytics events, video frames, metadata).
Manage and optimize Airflow DAGs (scheduling, retries, dependency management, error handling, backfilling).
Integrate and scale Airbyte connectors to pull data from tools like PostHog, Mixpanel, Pendo, and custom APIs.
Build high-reliability pipelines that can handle large, bursty session replay data.
Pipeline Reliability & Observability
Implement end-to-end monitoring: logs, metrics, alerts, data quality checks, schema validations.
Reduce pipeline failures and rate-limit issues (e.g., PostHog ingestion constraints).
Introduce automatic retries, dead-letter queues, and backpressure strategies.
Backend Engineering
Build and optimize backend services (Python/FastAPI, Node, etc.) that consume and expose pipeline outputs.
Improve the performance of data storage (Postgres/Neon, vector DBs, GCS).
Implement caching layers for metadata, summaries, and user-level insights.
Scalability & Performance
Architect systems that can scale across:
High-volume session replays
Large embeddings
JSON augmentation workloads
Batch and real-time computation
Identify bottlenecks and implement optimizations across the pipeline (I/O, compute, caching, parallelization).
Ownership of the Full Augmentation Flow
Directly manage all backend systems that produce:
Augmented interactions
Markdown summaries
Session highlights
User intent & frictions
Session tags
One-liner summaries
Product sections
User flow
GCS output storage
What you need to have:
5+ years in data engineering or backend engineering
Deep experience with Apache Airflow, DAG design, and orchestration at scale
Strong familiarity with Airbyte, ETL/ELT patterns, and connector configuration
Strong Python engineering background (FastAPI / Django / async patterns)
Experience processing large JSON datasets or high-volume event streams
Proven track record of building scalable, cost-efficient, well-monitored data systems
Familiarity with GCP (GCS, Cloud Run, Pub/Sub), AWS, or similar cloud environments
It’s a big plus if you have:
Experience with RRWeb or session replay data
Background in AI/ML data pipelines
Experience with vector databases, embeddings, or semantic search
Understanding of clickstream analytics
DevOps exposure (Docker, Terraform, CI/CD)
What we offer:
Fully remote (EU time zones)
A friendly work atmosphere
Growing the company together - your opinion counts
Building modern solutions that have a real, measurable impact on our clients’ businesses
Additional benefits tailored to your needs
The entire recruitment process will be fully remote and will consist of several stages, such as a conversation with the recruiter, a technical interview that includes a live-coding part, and meetings with team members and managers. These will allow us to get to know you better, but they will also give you the opportunity to get to know the company, so that both sides can be sure we’re making the right choice.
Senior Data Engineer
Senior Data Engineer
Warsaw, Poland (Remote)
Autoplay