Data Scientist – RAG & Document Intelligence
Craftware is a technology company of over 500 experts, empowering large organizations to solve complex business challenges with modern IT solutions - from sales systems and automation to data platforms and AI. We operate where technology must be reliable, secure, and scalable. We deliver end-to-end projects: from analysis and architecture through implementation to development and maintenance. We are a trusted partner of industry leaders such as Salesforce, Veeva, UiPath, and Databricks.
Model: remote
Employment type: full-time
Project
You will be at the heart of one of the most exciting AI initiatives at an international Consumer Health company: building a platform that lets business users retrieve, synthesize, and act on information locked inside complex enterprise documents using plain language.
This is a greenfield, research-meets-engineering role where you will design, experiment with, and implement a context-aware, multi-agent AI system – and the decisions you make will shape how the entire organization interacts with its knowledge assets.
Responsibilities:
Design, experiment with, and continuously optimize RAG pipelines – chunking strategies, embedding models, hybrid search, re-ranking, and context assembly
Benchmark and evaluate LLMs, embedding models, and document parsing solutions across accuracy, latency, reliability, and cost
Build evaluation datasets and define RAG-specific quality metrics using frameworks like RAGAS, DeepEval, and Langfuse
Design and iterate on multi-agent system architectures (LangGraph, LangChain, Pydantic AI)
Handle diverse, noisy document formats – PDFs, Word docs, presentations, scanned files, tables, charts, and mixed-format corpora
Track token consumption, latency profiles, and retrieval quality – making principled trade-off decisions between capability and operational cost
Shape a system that will serve business users across commercial, marketing, R&D, and product supply globally
Requirements:
Strong Python skills with hands-on experience in ML and generative AI workflows, including production-grade code
Solid NLP background – text representation, semantic search, embeddings, language model behaviour
Deep hands-on experience designing and optimizing RAG pipelines – chunking, hybrid search, re-ranking (LlamaIndex, LangChain, LightRAG)
Experience with document parsing across diverse and noisy formats, including tables, charts, and figures
Experience with LLM evaluation frameworks and RAG-specific quality metrics (RAGAS, DeepEval, Langfuse)
Familiarity with multi-agent AI frameworks – LangGraph, LangChain, or Pydantic AI
Experience with vector databases (pgvector, Pinecone, Qdrant, Weaviate)
Experience with Azure cloud services (Apps, Containers, Storage, AI Search, AI Foundry) and/or AWS equivalents
Experience with Databricks (Delta Lake, Unity Catalog, MLFlow)
Structured, experiment-driven approach to problem solving
Fluent English – written and spoken
Nice-to-Have
Databricks GenAI products: Vector Search, Agent Framework, Knowledge Assistance, Genie, Agent Bricks
API development and integration (FastAPI)
Knowledge of Model Context Protocol (MCP) for tool integration
Understanding of knowledge management, taxonomy design, and metadata enrichment for enterprise document repositories
Employment conditions:
B2B contract,
Daily support from team leaders,
Dedicated certification budget,
Assistance in defining and support in your development path,
Benefits package,
Integration trips/events.
Data Scientist – RAG & Document Intelligence
Data Scientist – RAG & Document Intelligence