Data Scientist – RAG & Document Intelligence

Data

Al. Jerozolimskie 134, Warszawa

Craftware

Full-time

B2B

Senior

Remote

44 - 52 USDNet per hour - B2B

Job description

Craftware is a technology company of over 500 experts, empowering large organizations to solve complex business challenges with modern IT solutions - from sales systems and automation to data platforms and AI. We operate where technology must be reliable, secure, and scalable. We deliver end-to-end projects: from analysis and architecture through implementation to development and maintenance. We are a trusted partner of industry leaders such as Salesforce, Veeva, UiPath, and Databricks.

Model: remote

Employment type: full-time

Project

You will be at the heart of one of the most exciting AI initiatives at an international Consumer Health company: building a platform that lets business users retrieve, synthesize, and act on information locked inside complex enterprise documents using plain language.

This is a greenfield, research-meets-engineering role where you will design, experiment with, and implement a context-aware, multi-agent AI system – and the decisions you make will shape how the entire organization interacts with its knowledge assets.

Responsibilities:

Design, experiment with, and continuously optimize RAG pipelines – chunking strategies, embedding models, hybrid search, re-ranking, and context assembly
Benchmark and evaluate LLMs, embedding models, and document parsing solutions across accuracy, latency, reliability, and cost
Build evaluation datasets and define RAG-specific quality metrics using frameworks like RAGAS, DeepEval, and Langfuse
Design and iterate on multi-agent system architectures (LangGraph, LangChain, Pydantic AI)
Handle diverse, noisy document formats – PDFs, Word docs, presentations, scanned files, tables, charts, and mixed-format corpora
Track token consumption, latency profiles, and retrieval quality – making principled trade-off decisions between capability and operational cost
Shape a system that will serve business users across commercial, marketing, R&D, and product supply globally

Requirements:

Strong Python skills with hands-on experience in ML and generative AI workflows, including production-grade code
Solid NLP background – text representation, semantic search, embeddings, language model behaviour
Deep hands-on experience designing and optimizing RAG pipelines – chunking, hybrid search, re-ranking (LlamaIndex, LangChain, LightRAG)
Experience with document parsing across diverse and noisy formats, including tables, charts, and figures
Experience with LLM evaluation frameworks and RAG-specific quality metrics (RAGAS, DeepEval, Langfuse)
Familiarity with multi-agent AI frameworks – LangGraph, LangChain, or Pydantic AI
Experience with vector databases (pgvector, Pinecone, Qdrant, Weaviate)
Experience with Azure cloud services (Apps, Containers, Storage, AI Search, AI Foundry) and/or AWS equivalents
Experience with Databricks (Delta Lake, Unity Catalog, MLFlow)
Structured, experiment-driven approach to problem solving
Fluent English – written and spoken

Nice-to-Have

Databricks GenAI products: Vector Search, Agent Framework, Knowledge Assistance, Genie, Agent Bricks
API development and integration (FastAPI)
Knowledge of Model Context Protocol (MCP) for tool integration
Understanding of knowledge management, taxonomy design, and metadata enrichment for enterprise document repositories

Employment conditions:

B2B contract,
Daily support from team leaders,
Dedicated certification budget,
Assistance in defining and support in your development path,
Benefits package,
Integration trips/events.

Tech stack

English

Python

master

RAG

advanced

LLM

advanced

Office location

Data Scientist – RAG & Document Intelligence

44 - 52 USDNet per hour - B2B

Summary of the offer

Data Scientist – RAG & Document Intelligence

Al. Jerozolimskie 134, Warszawa

Craftware

44 - 52 USDNet per hour - B2B

By applying, I consent to the processing of my personal data for the purpose of conducting the recruitment process. Zgodnie z art. 13 ust. 1 i 2 rozporządzenia Parlamentu Europejskiego i Rady (UE) 2016/679 z dnia 27 kwietnia 2016 r. w sprawie ochrony... MoreThis site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Check similar offers