AI Prompt Engineer
Join an Rapidly-Growing Consumer Intelligence Platform Delivering Insights for Leading Global Brands
Company Background
We're an AI-powered consumer intelligence platform that transforms billions of data points (Google searches, social conversations, product reviews, and videos) into strategic insights for leading global brands such as Coca-Cola, Bayer, and Unilever. We deliver in days what traditional research takes months to produce.
The Role
You will define prompt engineering across the company: prompt design standards, reusable templates, evaluation, and release governance. You’ll take prompt ownership off the data science team (so they focus on modeling) and enable client-facing teams to generate consistent analysis and narratives from structured metrics and verbatims.
This role partners daily with Data Science, Data Analysts, Client Success, and Platform Engineering.
What You’ll Do
Audit and refactor existing NLP/GenAI prompt libraries (sentiment, emotion, classification, driver extraction, summarization, etc.) and bring them to acceptance-test quality (schema compliance, accuracy targets, safety checks).
Build a modular prompt template system (parameterized by market/segment/language/output format) used across data science and client-facing reporting workflows.
Optimize prompts across at least 2 LLM providers (OpenAI / Claude / Gemini), document provider-specific behaviors, and maintain output consistency requirements.
Define prompt reliability behaviors for invalid inputs, missing data, and formatting failures (validation rules, refusal rules, confidence flags, and fallback requirements for calling services).
Implement a prompt QA program: regression test sets, automated checks (schema/consistency/safety), and a human review rubric with scorecards tracked over time.
Create analytics + storytelling prompts that convert structured metrics/tables (e.g., brand equity, benchmarks, trends) into repeatable, executive-ready narratives with citations/evidence selection.
Deliver enablement: publish playbooks, run prompt review sessions, and train analysts/client success on using templates and producing consistent outputs.
Own the prompt repository and lifecycle (GitHub PR workflow, versioning, tagging, approvals, release notes, monitoring, and retirement).
Required Qualifications
Experience
3+ years building prompts for production LLM workflows (prompt engineering / LLM applications / applied NLP with LLMs), including ownership of prompt changes and outcomes.
Multi-LLM: Hands-on with at least 2 providers (OpenAI GPT series, Anthropic Claude, Google Gemini); ability to explain and handle provider behavior differences.
Prompt Engineering (Proficient): Structured outputs (JSON/tables), reusable templates, few-shot/zero-shot patterns, and multi-step prompting/tool use where applicable.
Evaluation (Working+): Automated prompt checks + regression approach + human rubric; examples of metrics/scorecards.
Multilingual (Working+): English-authored prompts that reliably process non-English text; can describe languages supported and evaluation method.
Python (Proficient): 2+ years professional Python; ability to build tooling for prompt testing/evaluation.
Orchestration (Working+): LangChain, LlamaIndex, or equivalent orchestration framework experience.
Version Control (Working+): Git/GitHub PR workflow experience for shared libraries.
Enablement: Evidence of documentation and training delivered to non-technical stakeholders (playbooks, workshops, recurring review cadence).
Education: Bachelor’s in a related quantitative field or equivalent experience (4+ additional years relevant work in lieu of degree).
Preferred Qualifications
Prompt security (prompt injection resistance, jailbreak mitigation, red teaming practices).
CI/CD for prompts (GitHub Actions or equivalent) running regression suites on PRs and nightly.
Azure + Databricks/Delta Lake/Unity Catalog familiarity.
SQL skills for inspecting and validating the data behind prompts.
Consumer insights / market research / CPG analytics background.
Fine-tuning knowledge (tradeoffs vs prompt-only approaches).
NLP tooling: Hugging Face, spaCy, NLTK; PySpark for data inspection.
Location & Work Arrangement
Timezone overlap: Candidates should be able to overlap at least 4 hours with US Eastern and/or European business hours (team-dependent).
Travel: Not required unless otherwise stated for team offsites (≤2x/year).
What We Offer
Competitive salary
Remote-first role
Clear deliverables and review cadence with outcomes-based evaluation
Opportunity to own a company-wide prompt platform: templates, evaluation, governance, and enablement.
AI Prompt Engineer
AI Prompt Engineer