Senior Data Engineer (AI Consumer Intelligence Platform)

Data

Senior Data Engineer (AI Consumer Intelligence Platform)

Data
Remote, New York

Kratos Growth

Full-time
B2B
Senior
Remote

Job description

Our client is hiring Data Engineers

Join a rapidly growing AI Consumer Intelligence Platform Delivering Insights for the World’s Biggest Brands


Hiring Company Background


We're an AI-powered consumer intelligence platform that processes 50+ billion data points monthly - Google searches, social conversations, product reviews, and videos - to deliver actionable consumer insights for Fortune 500 brands in days instead of months. Our clients include global leaders in beverages, personal care, and consumer packaged goods.


The Role


As a Senior Data Engineer, you'll architect and scale production data pipelines that power our NLP and ML systems processing billions of multilingual data points daily.


Reporting to our newly appointed CTO, you'll own the complete data lifecycle—from ingestion and transformation through deployment and observability—while defining infrastructure standards for a growing engineering team.


This is a high-ownership role at an early stage: no legacy code politics and no entrenched hierarchies. You'll convert MVPs into scalable products, establish DataOps/DevOps standards, and design governance mechanisms that prevent technical debt. Your architectural decisions will directly impact how Fortune 500 companies access real-time consumer intelligence.


Tech Stack


- Core: Python, PySpark, SQL

- Cloud & Infrastructure: Azure ecosystem, Databricks

- Deployment: Kubernetes, containerization, observability tooling

- NLP/ML: Large Language Models, LLM APIs, Spacy/NLTK/CoreNLP/TextBlob

- Data: Robust pipelines for multi-language text at scale


What You'll Do


• Design, build, and maintain production data pipelines processing 10M+ text records daily across multiple languages

• Architect scalable NLP data infrastructure using PySpark, Databricks, and Azure services

• Integrate Large Language Model APIs into production pipelines for text analysis and enrichment

• Establish DataOps standards including CI/CD, testing frameworks, and deployment automation

• Implement observability and alerting for pipeline health, data quality, and system performance

• Collaborate with data scientists to productionize ML models and NLP systems

• Define data governance frameworks and quality SLAs for enterprise client delivery

• Mentor team members and contribute to technical hiring as the team scales


Required Qualifications


Experience


- 5+ years building and maintaining production ETL/ELT data pipelines

- 2+ years working with text/NLP data (tokenization, embeddings, multilingual processing)

- 3+ years of data products shipped to production that serve active business uers


Technical Skills


- Python: 4+ years in production environments (required)

- PySpark: 2+ years (or, Spark with 1+ years + strong Python at 4+ years)

- SQL: 3+ years including complex queries and performance optimization

- Databricks: 1+ years production use (notebooks, Delta Lake, job scheduling)

- Cloud Platform: 2+ years with Azure (preferred) or equivalent AWS/GCP experience

- Containers/Kubernetes: Experience deploying containerized applications to Kubernetes


Education


Bachelor's degree in Computer Science, Data Science, Engineering, or related quantitative field.


Preferred Qualifications


- 3+ years Databrick experience, including Delta Lake architecture and Unity Catalog (strongly preferred)

- Azure ecosystem depth: Data Factory, Databricks, Blob Storage, DevOps

- LLM integration experience: OpenAI, Anthropic, or Azure OpenAI API integration in production

- LLM fine-tuning experience

- Experience with observability tools: DataDog, Grafana, or Azure Monitor

- Processing experience at scale: 1B+ records

- Multilingual text processing: 3+ non-English languages with Unicode and tokenization handling

- NLP libraries: Spacy, NLTK, CoreNLP, TextBlob (familiarity)

- Consumer insights, CPG/FMCG, or advertising technology experience

- Experience at early-stage, high-growth companies


What We Offer


- Competitive compensation (with performance bonus and equity opportunities)

- Fully remote (4+ hour overlap with U.S. Eastern Time Zone desired)

- Modern Stack: Work with cutting-edge NLP/ML technologies at scale

- High ownership & impact: Directly shape architecture decisions for a platform serving Fortune 500 clients

- Growth Trajectory: Join as a foundational team member with clear path to technical leadership



Tech stack

    PySpark

    advanced

    Azure

    advanced

    Kubernetes

    advanced

    Databricks

    advanced

    SQL

    advanced

    Python

    advanced

    NLP

    regular

Office location

Published: 13.02.2026