Test & Improve AI Systems Jobs
Evaluate AI outputs for quality, accuracy, usefulness, tone, safety, hallucination, policy compliance, grounding, and reliability.
counts when
[x] I test AI outputs for quality, accuracy, safety, and usefulness.
[x] AI is part of the work method or operating model.
[x] The role rewards judgment about AI outputs, tools, workflows, or adoption.
reject when
[-] AI appears only in company boilerplate.
[-] The core job is traditional software engineering or ML infrastructure.
[-] The posting lacks evidence for this use case.
source signals
# AI Evaluation / QA
# LLM Evaluator