JobsML Engineer - Automated Evaluation and Adversarial Design

ML Engineer - Automated Evaluation and Adversarial Design

Apple

ML Engineer - Automated Evaluation and Adversarial Design

Apple

Location

Culver City, CA, Cupertino, CA, San Diego, CA, Seattle, WA

Type

Full-time

Posted

5/8/2026

Compensation

Not listed

Undergraduate with 2+ Years of Experience

Approval 98.9%·Filings 5,543·New hires 2,691·

👑 Elite Sponsor

·FY 2025

Job description

The Productivity and Machine Learning Evaluation team focuses on ensuring the quality of AI-powered features across various productivity and creative applications. This role involves building and scaling automated evaluation systems and designing methodologies for adversarial and stress testing of AI features. The work requires a deep understanding of AI system failures and rigorous quality measurement. The position offers an opportunity to shape the evaluation infrastructure that impacts hundreds of millions of users.

Requirements

Bachelor's degree in Computer Science, Machine Learning, Statistics, or a related field
4+ years of experience building or significantly extending ML evaluation systems
Experience independently defining evaluation architecture and methodology for AI or ML systems
Experience designing adversarial or red-teaming test methodologies for ML models or AI-powered features
Experience with Python and ML frameworks such as PyTorch or TensorFlow
Track record of owning technical direction for evaluation efforts across multiple features or product areas
Experience evaluating user-facing AI features in consumer applications
Familiarity with productivity software or creative tools
Experience ensuring alignment between automated and human evaluation methods
Track record of designing evaluation systems that scale across multiple features or product areas
Experience evaluating different types of AI systems, including API-based and custom-trained models
Demonstrated ability to communicate evaluation findings and readiness assessments
Experience leveraging automation to scale evaluation data generation and analysis
Experience building evaluation pipelines for conversational AI or dialogue systems
Familiarity with agent orchestration frameworks and observability tooling
Experience designing adversarial tests for tool-use reliability or agent planning quality
Graduate degree in a relevant field

Responsibilities

Design, build, and maintain automated evaluation systems for assessing AI feature quality at scale.
Create adversarial test suites to probe model weaknesses.
Run stress tests to ensure features perform under demanding conditions.
Develop evaluation frameworks and rubrics for quality assessment.
Produce quality assessment reports and recommendations on model readiness.
Design multi-turn stress-test pipelines for evaluating conversation flows.
Ensure alignment between automated and human evaluation methods.
Communicate evaluation findings to cross-functional partners.

Benefits

Employees at Apple are often offered comprehensive benefits that support physical and mental well-being—flexible medical plans, confidential counseling, onsite wellness centers at major campuses, and resources for fitness and daily life. Families typically receive fertility support, paid parental leave with gradual return, caregiving leave, and dependent-care guidance, while financial perks commonly include stock grants (with purchase discounts), 401(k) matching, and income-protection coverage. Employees also see robust time off, Apple University learning and tuition reimbursement, donation matching and paid volunteer hours, and deep product and partner discounts.

Is this posting expired or inaccurate?