QA Engineer/Tester (AI/LLM)
Iłżecka 26e, Warszawa
DPDgroup IT Solutions
About the Role
We’re looking for a QA Tester with strong AI literacy and data validation skills to test how AI agents behave and ensure that their outputs are stored accurately and reliably in backend systems. This is a manual, exploratory testing role — ideal for someone who can combine curiosity about AI behavior with practical knowledge of data flows, relational databases, and result traceability.
You won’t be writing automation scripts, but you will need to understand how AI agents operate, how their outputs are used, and how to verify correctness across both the UI and backend.
What You’ll Do
Manually test AI-driven workflows that generate content, complete tasks, or make decisions
Assess AI behavior by checking for:
Consistency and repeatability
Hallucinations, inaccuracy, or bias
Relevance and task alignment
Evaluate data integrity:
Trace AI-generated data from the interface to the backend
Use SQL to validate how outputs are stored, structured, or logged
Compare AI intent/output to the resulting records in the database
Reproduce and report subtle, fuzzy, or probabilistic issues with structured documentation
Collaborate with engineers, AI designers, and product owners to define quality criteria across system layers
Must-Have Skills
2+ years of manual QA experience, ideally in exploratory or context-driven testing environments
Practical understanding of LLMs and AI tools (ChatGPT, Claude, etc.)
Basic to intermediate knowledge of SQL (joins, filters, aggregations, subqueries)
Experience validating data pipelines, audit logs, or relational integrity
Able to detect both UI anomalies and backend data discrepancies
Clear written and verbal communication for reporting behavior-based bugs
Familiarity with testing non-deterministic or AI-powered systems
Nice-to-Have
Understanding of prompt engineering and how LLM behavior can shift with input changes
Familiarity with AI agent architectures (e.g., LangChain, ReAct, RAG systems)
Experience working with BI tools (e.g., Metabase, Redash) or data validation frameworks
Background in content moderation, safety testing, or AI/UX evaluation