Senior Data Engineer (AI Consumer Intelligence Platform)

Data

Senior Data Engineer (AI Consumer Intelligence Platform)

Data
Remote, New York

Kratos Growth

Full-time
B2B
Senior
Remote

Job description

We're hiring remote Data Engineers

Join a rapidly growing AI Consumer Intelligence Platform Delivering Insights for the World’s Leading Brands.

We're an AI-powered consumer intelligence platform that processes 50+ billion data points monthly to deliver actionable consumer insights for global brands such as Coca-Cola, Unilever, and Bayer. We deliver in days what traditional research takes months to produce.

The Role

As a Senior Data Engineer, you'll architect and scale production data pipelines that power our NLP and ML systems processing billions of multilingual data points daily.

Reporting to our CTO, you'll own the complete data lifecycle, from ingestion and transformation through deployment and observability, while defining infrastructure standards for a growing engineering team.

This is a high-ownership role at an early stage: no legacy code politics and no entrenched hierarchies. You'll convert MVPs into scalable products, establish DataOps/DevOps standards, and design governance mechanisms that prevent technical debt. Your architectural decisions will directly impact how Fortune 500 companies access real-time consumer intelligence.

Tech Stack

- Core: Python, PySpark, SQL

- Cloud & Infrastructure: Azure ecosystem, Databricks

- Deployment: Kubernetes, containerization, observability tooling

- NLP/ML: Large Language Models, LLM APIs, Spacy/NLTK/CoreNLP/TextBlob

- Data: Robust pipelines for multi-language text at scale

What You'll Do

• Design, build, and maintain production data pipelines processing 10M+ text records daily across multiple languages

• Architect scalable NLP data infrastructure using PySpark, Databricks, and Azure services

• Integrate Large Language Model APIs into production pipelines for text analysis and enrichment

• Establish DataOps standards including CI/CD, testing frameworks, and deployment automation

• Implement observability and alerting for pipeline health, data quality, and system performance

• Collaborate with data scientists to productionize ML models and NLP systems

• Define data governance frameworks and quality SLAs for enterprise client delivery

• Mentor team members and contribute to technical hiring as the team scales

Qualifications

Experience

- 5+ years building and maintaining production ETL/ELT data pipelines

- 2+ years working with text/NLP data (tokenization, embeddings, multilingual processing)

- 3+ years of data products shipped to production that serve active business uers

Technical Skills

- Python: 4+ years in production environments (required)

- PySpark: 2+ years (or, Spark with 1+ years + strong Python at 4+ years)

- SQL: 3+ years including complex queries and performance optimization

- Databricks: 1+ years production use (notebooks, Delta Lake, job scheduling)

- Cloud Platform: 2+ years with Azure (preferred) or equivalent AWS/GCP experience

- Containers/Kubernetes: Experience deploying containerized applications to Kubernetes

Education

Bachelor's degree in Computer Science, Data Science, Engineering, or related quantitative field.

Preferred Qualifications

- Databrick experience, including Delta Lake architecture and Unity Catalog (strongly preferred)

- Azure ecosystem depth: Data Factory, Databricks, Blob Storage, DevOps

- LLM integration experience: OpenAI, Anthropic, or Azure OpenAI API integration in production

- LLM fine-tuning experience

- Experience with observability tools: DataDog, Grafana, or Azure Monitor

- Processing experience at scale: 1B+ records

- Multilingual text processing: 3+ non-English languages with Unicode and tokenization handling

- NLP libraries: Spacy, NLTK, CoreNLP, TextBlob familiarity

- Consumer insights, CPG/FMCG, or advertising technology experience

- Experience at early-stage, high-growth companies

What We Offer

- Long-term B2B opportunity with performance bonus opportunities)

- Fully remote (4+ hour overlap with U.S. Eastern Time Zone desired)

- Modern Stack: Work with cutting-edge NLP/ML technologies at scale

- High ownership & impact: Directly shape architecture decisions for a platform serving Fortune 500 clients

- Growth Trajectory: Join as a foundational team member with clear path to technical leadership

Tech stack

    PySpark

    advanced

    Azure

    advanced

    Kubernetes

    advanced

    Databricks

    advanced

    SQL

    advanced

    Python

    advanced

    NLP

    regular

Office location

Senior Data Engineer (AI Consumer Intelligence Platform)

Summary of the offer

Senior Data Engineer (AI Consumer Intelligence Platform)

Remote, New York
Kratos Growth
By applying, I consent to the processing of my personal data for the purpose of conducting the recruitment process. Please be informed that the data controller is Kratos Growth (hereinafter "controller"). You have the right to request access to your ... MoreThis site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.