Data Pipeline / ML Engineer
Remote — Europe-friendly hours | Full-time | Senior IC
Why this role matters
Vequity is building the world’s most robust, contextualized buyer intelligence network for investment banks, private equity firms, and strategic acquirers — a platform with over 2.1 million buyer profiles, each containing ~100 structured and inferred data fields. Our proprietary AI agents continuously enrich, infer, and structure buyer intelligence at scale.
We are looking for one exceptional senior engineer to own the foundation everything else is built on: the data pipeline, the entity resolution layer, the AI agent orchestration, and the quality systems that turn messy, multi-source inputs into a clean, queryable graph. You will not be one of fifty engineers on a data platform team — you will be the person who designs and owns it.
If you have strong opinions about data contracts, entity resolution is a problem you’ve actually lost sleep over, and you’ve built production LLM pipelines that do more than summarize PDFs — we want to talk.
What you’ll own
• Multi-source data architecture. Systems handling multiple write paths — external providers, LLM hygiene agents, and customer-claimed edits — with versioning, lineage, and observability across pipelines.
• Data quality and operations. Data contracts, pipeline unit tests, integration testing, confidence scoring, human-in-the-loop validation, anomaly detection, monitoring, alerting, and runbooks. Cost and performance optimization across cloud resources.
• Machine learning and matching systems. Embeddings infrastructure, vector generation, retrieval optimization, semantic search pipelines, reranking, and evaluation frameworks that measure model performance against human judgment.
• Entity resolution and master data management. Deterministic blocking (fuzzy matching, location) combined with LLM-based evaluation for match decisions. Confidence scoring models. Handling the lifecycle complexity of mergers, acquisitions, spin-offs, rebranding, and temporal relationship changes.
• Entity relationships and graph modeling. Parent/subsidiary hierarchies and PE firm → portfolio company chains. Evaluating and implementing graph query capabilities (Apache AGE, Neo4j, or optimized Postgres patterns) for traversal that semantic search cannot address.
• AI agent integration. LLM-based agents that clean, enrich, and infer attributes for buyer profiles. Robust prompting architectures, JSON schema validation, structured AI → JSON → database pipelines with error recovery, and feedback loops.
What we’re looking for
Core requirements
• 5+ years in data engineering with strong Python (Pydantic a bonus), SQL, and cloud data stacks (GCP experience preferred).
• Experience with orchestration frameworks (Airflow, Dagster, Prefect) and/or data platforms (Databricks).
• Production experience designing or integrating AI/LLM agents for data enrichment — structured AI → JSON → database pipelines with error recovery and monitoring.
• Working knowledge of prompt engineering, MCP servers, function calling, and embedding-based retrieval.
• Comfort with unstructured data (web pages, PDFs, filings) and NLP-driven structuring pipelines.
• Excellent written communication — this is a remote, async-heavy role.
Nice to have
• Prior experience with entity resolution or master data management — you understand why matching company records is fundamentally hard.
• Familiarity with graph databases or graph query patterns (Neo4j, Apache AGE, recursive CTEs).
• Event sourcing or append-only architectures for audit trails and data replay.
• Background in investment data, market intelligence, or deal sourcing platforms.
• Familiarity with data quality frameworks (dbt, Great Expectations).
• Experience as an early or first data hire at a startup.
What success looks like in year one
• 99%+ structured data consistency across all AI-enriched buyer profiles.
• 99%+ data hygiene accuracy.
• Fully automated ingestion and inference pipelines with human validation loops.
• >80% reduction in manual cleanup and error handling.
• Continuous AI-driven enrichment that expands data coverage, accuracy, and relevance monthly.
• You can’t stop thinking about new ideas for improvement — and you ship them.
Compensation and benefits
We pay competitively for the European market and we’re transparent about it.
• Rate: €6,500 – €8,000/month (B2B). EOR employment available as alternative.
• Equity: Stock options available — details shared at offer stage.
• Time off: Manage your own schedule. We trust you.
• Health: €150/month health and wellness stipend.
• Engagement: B2B contract (preferred) or full-time via Deel EOR — your choice. 30-day mutual notice.
• Equipment: €2,500 one-time allowance — your choice of setup.
How we work
• Fully remote. We are based in Denver, Colorado (MT, UTC-7). You can work from Poland, Romania, Czechia, Bulgaria, Portugal, Spain, or most of Europe.
• Hours overlap. We ask for 3–4 hours of daily overlap with Denver — practically, that means working until roughly 18:00–19:00 your local time, minimum four days per week. The rest of your day is yours to structure.
• Async-first. We write things down. Docs, Loom videos, and thoughtful PR descriptions are the norm. Meetings happen when they’re the fastest path to clarity, not by default.
• Small team, direct access. You will work directly with the founder and a tight core team. No middle management. Your work ships fast.
Our interview process (target: 7–10 business days)
• Step 1 — Intro call (30 min). Logistics, comp expectations, and timeline.
• Step 2 — Technical deep-dive with our Head of Engineering (60 min). Architecture discussion about a real problem we’re working on. Bring your experience with data pipelines, entity resolution, or LLM agents — we want to see how you think through trade-offs.
• Step 3 — Paid take-home (4–6 hours, compensated at €400). You design an entity resolution pipeline for a small dataset of messy company records — deterministic blocking plus an LLM evaluator, with structured output and confidence scoring. You keep the code.
• Step 4 — Take-home review with Head of Engineering (60 min). We walk through what you built, discuss your design decisions, and stretch into a follow-on system design problem.
• Step 5 — Final conversation with CEO (30 min). Vision, culture, and mutual expectations. Offer within 48 hours.
Data Pipeline / ML Engineer
Data Pipeline / ML Engineer