Senior Data Engineer (AI Consumer Intelligence Platform)
Our client is hiring Data Engineers
Join a rapidly growing AI Consumer Intelligence Platform Delivering Insights for the World’s Biggest Brands
Hiring Company Background
We're an AI-powered consumer intelligence platform that processes 50+ billion data points monthly - Google searches, social conversations, product reviews, and videos - to deliver actionable consumer insights for Fortune 500 brands in days instead of months. Our clients include global leaders in beverages, personal care, and consumer packaged goods.
The Role
As a Senior Data Engineer, you'll architect and scale production data pipelines that power our NLP and ML systems processing billions of multilingual data points daily.
Reporting to our newly appointed CTO, you'll own the complete data lifecycle—from ingestion and transformation through deployment and observability—while defining infrastructure standards for a growing engineering team.
This is a high-ownership role at an early stage: no legacy code politics and no entrenched hierarchies. You'll convert MVPs into scalable products, establish DataOps/DevOps standards, and design governance mechanisms that prevent technical debt. Your architectural decisions will directly impact how Fortune 500 companies access real-time consumer intelligence.
Tech Stack
- Core: Python, PySpark, SQL
- Cloud & Infrastructure: Azure ecosystem, Databricks
- Deployment: Kubernetes, containerization, observability tooling
- NLP/ML: Large Language Models, LLM APIs, Spacy/NLTK/CoreNLP/TextBlob
- Data: Robust pipelines for multi-language text at scale
What You'll Do
• Design, build, and maintain production data pipelines processing 10M+ text records daily across multiple languages
• Architect scalable NLP data infrastructure using PySpark, Databricks, and Azure services
• Integrate Large Language Model APIs into production pipelines for text analysis and enrichment
• Establish DataOps standards including CI/CD, testing frameworks, and deployment automation
• Implement observability and alerting for pipeline health, data quality, and system performance
• Collaborate with data scientists to productionize ML models and NLP systems
• Define data governance frameworks and quality SLAs for enterprise client delivery
• Mentor team members and contribute to technical hiring as the team scales
Required Qualifications
Experience
- 5+ years building and maintaining production ETL/ELT data pipelines
- 2+ years working with text/NLP data (tokenization, embeddings, multilingual processing)
- 3+ years of data products shipped to production that serve active business uers
Technical Skills
- Python: 4+ years in production environments (required)
- PySpark: 2+ years (or, Spark with 1+ years + strong Python at 4+ years)
- SQL: 3+ years including complex queries and performance optimization
- Databricks: 1+ years production use (notebooks, Delta Lake, job scheduling)
- Cloud Platform: 2+ years with Azure (preferred) or equivalent AWS/GCP experience
- Containers/Kubernetes: Experience deploying containerized applications to Kubernetes
Education
Bachelor's degree in Computer Science, Data Science, Engineering, or related quantitative field.
Preferred Qualifications
- 3+ years Databrick experience, including Delta Lake architecture and Unity Catalog (strongly preferred)
- Azure ecosystem depth: Data Factory, Databricks, Blob Storage, DevOps
- LLM integration experience: OpenAI, Anthropic, or Azure OpenAI API integration in production
- LLM fine-tuning experience
- Experience with observability tools: DataDog, Grafana, or Azure Monitor
- Processing experience at scale: 1B+ records
- Multilingual text processing: 3+ non-English languages with Unicode and tokenization handling
- NLP libraries: Spacy, NLTK, CoreNLP, TextBlob (familiarity)
- Consumer insights, CPG/FMCG, or advertising technology experience
- Experience at early-stage, high-growth companies
What We Offer
- Competitive compensation (with performance bonus and equity opportunities)
- Fully remote (4+ hour overlap with U.S. Eastern Time Zone desired)
- Modern Stack: Work with cutting-edge NLP/ML technologies at scale
- High ownership & impact: Directly shape architecture decisions for a platform serving Fortune 500 clients
- Growth Trajectory: Join as a foundational team member with clear path to technical leadership