JobsData Scientist 5 - AI Evals
Netflix logo

Data Scientist 5 - AI Evals

Netflix

Location

remote

Type

Full-time

Posted

6/7/2026

Compensation

$372,000 - $600,000 per year

Undergraduate with 5+ Years of Experience
Approval 98.4%·Filings 128·New hires 57·
💎 Strong Sponsor
·FY 2025

Job description

The Senior Data Scientist specialized in AI Evals at Netflix will play a crucial role in architecting systems to measure and optimize generative AI systems in production. This position focuses on enhancing player experiences in games through rigorous evaluation of AI-powered storytelling and interactions. The role involves collaboration with world-class scientists, engineers, and designers to ensure high-quality AI integration in gaming. The ideal candidate will bridge the gap between technical capabilities and user experience, ensuring engaging and safe gameplay.

Requirements

  • Ph.D. in Data Science, Computer Science, Statistics, Cognitive Science, or a related quantitative field.
  • 4+ years of industry experience in Data Science, ML, or AI with a strong foundation in experimental design, causal inference, A/B testing, and uncertainty quantification.
  • Experience with modern AI Evals and observability frameworks.
  • Proven track record of evaluating LLM and agentic systems.
  • Deep understanding of prompt engineering, RAG Evals, and agentic Evals.
  • Understanding of agent architectures and evaluation of long-horizon reasoning and complex tool-use.
  • Ability to collaborate effectively with game teams to translate creative objectives into data specifications.
  • Passion for developing AI that enhances joy, entertainment, and storytelling.

Responsibilities

  • Partner with the GenAI research team to ensure product graduation from R&D into production at scale.
  • Build and operate robust evaluation pipelines for production-stage GenAI experiences.
  • Curate high-quality datasets and test suites to establish ground-truth performance across generative tasks.
  • Design experiments to understand trade-offs between technical attributes and user experience quality.
  • Measure the coherence, fluency, relevance, and joy value of AI-powered game features.
  • Design protocols to detect and mitigate toxicity and out-of-character behavior in gaming environments.
  • Guide evaluations for internal agentic tools used by technical, business, and creative teams.

Benefits

  • Employees at Netflix are often offered flexible, people-first benefits—unlimited time away, generous parental leave, global family-forming support, mental-health programs (mindfulness, free counseling/coaching), and health coverage tailored by country. Financially, Netflix pays at personal top-of-market and lets employees choose their mix of cash vs. fully-vested 10-year stock options, alongside donation and volunteer matching. Convenience perks can include trust-based travel/expense policies, relocation support, and “Work, Not Drive” rideshare flexibility.

Is this posting expired or inaccurate?