~/AINative.careersman ai-evaluation-qa-jobs06:51 · tmux 0 · 128×48
ROLE MANUAL · AI_EVALUATION_QA

AI Evaluation and QA Jobs

Quality operators who design evals, inspect AI outputs, find failure patterns, and define what good behavior means for agents or AI workflows.

counts when

[x] builds rubrics

[x] finds hallucinations

[x] tests agent behavior

reject when

[-] generic QA

[-] data labeling only

[-] model training only

source signals

# groundedness

# benchmark scenarios

# failure analysis

editorial filter

This work asks whether an AI system is good enough for real use. The job should involve eval rubrics, golden datasets, groundedness checks, hallucination review, workflow QA, or failure analysis.

Generic QA is not enough. Data labeling alone is not enough. A match should ask for judgment about model output, user usefulness, business risk, and where plausible answers still fail.

live matches · 13
AI_EVALUATION_/_QAAI Data ExpertLILTRemote; Indiana (Remote); New York, NY (Remote); Pennsylvania (Remote); san francisco, ca; Seattle, WA; Washington (Remote)

Contract eval/annotation role for AI output quality — you judge whether AI-generated content meets accuracy, cultural, and brand standards, with flexible hours and no coding required.

comp not disclosed1d agoapply
AI_EVALUATION_/_QAAI Operations Specialist | Housing (New Grads 2025-2026)EliseAINew York City

Entry-level AI ops seat watching dashboards and logs to catch AI failures before they escalate — you flag and coordinate, engineers do the fixing.

comp not disclosed1d agoapply
AI_EVALUATION_/_QAAI Conversation DesignerPearlRemote, US

Conversation design role writing prompts, optimizing chatbot flows, and QA-testing LLM experiences for customer journeys.

comp not disclosed4d agoapply
AI_EVALUATION_/_QAApplied AI Evaluation ScientistJumpRemote (U.S.)

Evaluation role owning RAG and agent quality frameworks, with research-grade Python instead of production infrastructure.

$180–270k4d agoapply
AI_EVALUATION_/_QAAI Quality OperatorNeon HealthSan Francisco, CA (USA)

Healthcare QA role reviewing AI agent calls, catching errors, labeling issues, and improving real workflows.

$594k4d agoapply
AI_EVALUATION_/_QAOperations Specialist, AI EnablementBumble Inc.Austin, TX / London / Remote

QA role reviewing AI support conversations for accuracy, policy fit, tone, and recurring failure patterns.

comp not disclosed4d agoapply
AI_EVALUATION_/_QAAdversarial Prompt ExpertReinforce LabsRemote

Red-team role finding jailbreaks, ranking model failures, and documenting attack paths so safety teams can patch them.

comp not disclosed4d agoapply
AI_EVALUATION_/_QAPrompt EngineerCantinaLos Angeles; San Francisco

Expert prompt role owning AI character behavior, personality systems, and evaluation frameworks for social generative AI experiences.

$150–180k6d agoapply
AI_EVALUATION_/_QAPrompt Engineer - AI Innovation Team - USSitusAMCUS - Remote

Prompt-focused AI role owning use-case translation, agent behavior oversight, and quality testing for commercial real estate workflows.

$50k6d agoapply
AI_EVALUATION_/_QAAI Content Reviewer (Video)Crossing HurdlesRemote

High-signal role for evaluating the next generation of AI video models.

$25–34k1w agoapply
AI_EVALUATION_/_QASenior AI Evaluation Specialist — IP Guardrails and Agentic WorkflowsAdobeNew York, NY

Recently posted role for Adobe seeking an AI Eval Specialist.

$155–281k1w agoapply
AI_EVALUATION_/_QAAI Agent Architect, Customer ExperienceAirtableRemote - US

Strong fit for Airtable - focus on workflow designer.

$196–278k1w agoapply
AI_EVALUATION_/_QAAI Operations SpecialistBretton AISan Francisco, CA

An interesting operational role for someone interested in the evaluation and quality side of AI agent deployments.

$90–105k1w agoapply