Work scheduled according to US Eastern or Central Time zones
Responsibilities:
• Design and implement scalable QA strategies for validating LLM-generated outputs, including meeting summaries, task extraction, document search results, and contextual content
• Evaluate the effectiveness and reliability of GenAI features across variable formatting preferences (e.g., Robert’s Rules, anonymous vs. named notes, bullet vs. narrative)
• Create prompt-scoring and output confidence models to evaluate changes in behavior or regressions from previous prompt iterations
• Partner with product teams to align GenAI output validation to real-world use cases and customer expectations
• Collaborate within a modern Microsoft-based environment, including .NET Core (C#), Azure, microservices, and Cypress
• Leverage Azure OpenAI, Azure AI Search, Recall.AI, Windsurf IDE, and RAG (Retrieval-Augmented Generation) concepts to identify edge cases and test data relevancy
• Where appropriate, use or configure AI-based tooling to assist in regression testing, test generation, or PR review workflows
• Document QA methodologies for AI testing and build playbooks for repeatability and internal enablement
• Contribute to evolving the definition of quality in GenAI — shifting from traditional pass/fail to output value, usability, and trust
• Support broader QA initiatives (when needed), including Cypress test maintenance, smoke test maturity, and test coverage improvements
• Help reduce cycle time and deployment friction by improving overall test reliability and structure
Requirements:
Minimum Qualifications:
• 2+ years in QA, test engineering, or quality automation within a software product environment
• Proven experience validating generative AI/LLM outputs (e.g., OpenAI, Claude, Cohere, Anthropic, etc.)
• Deep understanding of prompt engineering, tuning, and the challenges of hallucinations and inconsistent LLM behavior
• Familiarity with techniques like prompt scoring, fuzzy matching, domain validation, and output consistency testing
• Experience with both manual and automated test strategy design in dynamic, prompt-based systems
• Ability to work independently in remote settings, delivering structure within ambiguity
• Excellent communication skills with ability to document results, process, and rationale clearly
Preferred Qualifications:
• Experience testing GenAI features in a B2B SaaS, enterprise, or regulated environment (e.g., education, healthcare, financial services)
• Working knowledge of:
• Experience using GenAI for QA acceleration (e.g., writing test cases, automating regression checks)
• Familiarity with Agile/DevOps environments, including CI/CD pipelines and shift-left QA practices
• Experience working with or supporting offshore/nearshore QA teams
Our offer:
• 100% remote work
• MultiSport Plus
• Group insurance
• Medicover Premium
• e-learning platform
Net per hour - B2B
Check similar offers
Check similar offers