Data Engineer (NLP) – Production Data Systems (Remote)

Data

Remote, New York

Kratos Growth

Full-time

Any

Senior

Remote

Job description

Kratos Growth's client is hiring Data Engineers (NLP) – Production Data Systems

Join a rapidly growing AI Consumer Intelligence Platform Delivering Insights for the World’s Biggest Brands

Hiring Company Background

Led by industry veterans from Unilever and Coca-Cola, our platform synthesizes massive-scale data (billions of Google searches, social conversations, product reviews, and videos) to deliver actionable consumer insights for Fortune 500 clients in days instead of months.

Our clients include global leaders in beverages, personal care, and consumer packaged goods.

With a newly appointed CTO building our engineering team, this is your opportunity to shape data engineering standards, define infrastructure architecture, and establish pipeline best practices at a high-growth company, while working remotely from anywhere in the world as we scale in 2026 and beyond.

No legacy code politics, no entrenched hierarchies, no technical debt from someone else's decisions that you're powerless to change.

This is your chance to build production data systems at a high-growth company. Your impact will be immediate and visible.

Your Mission

As a Senior Data Engineer, you'll architect and scale production data pipelines that process hundreds of billions of data points for NLP and ML systems.

You'll own the complete data lifecycle—from ingestion and transformation through deployment and observability—while shaping our infrastructure strategy to optimize delivery speed, cost efficiency, and data quality.

This isn't just implementing prototypes. You'll convert MVPs into scalable products, establish DataOps/DevOps standards, and design governance mechanisms to eliminate technical debt across the stack. Your decisions will directly impact how Fortune 500 companies access billions of data points in real-time.

What You'll Work With

• Core Stack: Python, PySpark, SQL

• Cloud & Infrastructure: Azure ecosystem, Databricks (deep expertise required)

• Production Deployment: Kubernetes, containerization, observability tools

• NLP/ML: Large Language Models, LLM API integration, modern NLP libraries (Spacy, NLTK, CoreNLP, TextBlob)

• Data Engineering: Building robust, scalable pipelines for multi-language text processing

What We're Looking For

• 5+ years in data engineering practices, with strong NLP focus

• Strong experience solving real business problems through data engineering and data science

• Python expertise (PySpark proficiency preferred, or solid Python + Spark fundamentals)

• Production deployment experience in enterprise environments (containers, Kubernetes)

• Track record of shipping data products that customers actually use