JobsML Engineer - Automated Evaluation and Adversarial Design
ML Engineer - Automated Evaluation and Adversarial Design
AppleML Engineer - Automated Evaluation and Adversarial Design
AppleLocation
Culver City, CA, Cupertino, CA, San Diego, CA, Seattle, WA
Type
Full-time
Posted
5/8/2026
Compensation
Not listed
Undergraduate with 2+ Years of Experience
Approval 98.9%·Filings 5,543·New hires 2,691·
👑 Elite Sponsor
·FY 2025Job description
The Productivity and Machine Learning Evaluation team focuses on ensuring the quality of AI-powered features across various productivity and creative applications. This role involves building and scaling automated evaluation systems and designing methodologies for adversarial and stress testing of AI features. The work requires a deep understanding of AI system failures and rigorous quality measurement. The position offers an opportunity to shape the evaluation infrastructure that impacts hundreds of millions of users.
Requirements
- Bachelor's degree in Computer Science, Machine Learning, Statistics, or a related field
- 4+ years of experience building or significantly extending ML evaluation systems
- Experience independently defining evaluation architecture and methodology for AI or ML systems
- Experience designing adversarial or red-teaming test methodologies for ML models or AI-powered features
- Experience with Python and ML frameworks such as PyTorch or TensorFlow
- Track record of owning technical direction for evaluation efforts across multiple features or product areas
- Experience evaluating user-facing AI features in consumer applications
- Familiarity with productivity software or creative tools
- Experience ensuring alignment between automated and human evaluation methods
- Track record of designing evaluation systems that scale across multiple features or product areas
- Experience evaluating different types of AI systems, including API-based and custom-trained models
- Demonstrated ability to communicate evaluation findings and readiness assessments
- Experience leveraging automation to scale evaluation data generation and analysis
- Experience building evaluation pipelines for conversational AI or dialogue systems
- Familiarity with agent orchestration frameworks and observability tooling
- Experience designing adversarial tests for tool-use reliability or agent planning quality
- Graduate degree in a relevant field
Responsibilities
- Design, build, and maintain automated evaluation systems for assessing AI feature quality at scale.
- Create adversarial test suites to probe model weaknesses.
- Run stress tests to ensure features perform under demanding conditions.
- Develop evaluation frameworks and rubrics for quality assessment.
- Produce quality assessment reports and recommendations on model readiness.
- Design multi-turn stress-test pipelines for evaluating conversation flows.
- Ensure alignment between automated and human evaluation methods.
- Communicate evaluation findings to cross-functional partners.
Benefits
- Employees at Apple are often offered comprehensive benefits that support physical and mental well-being—flexible medical plans, confidential counseling, onsite wellness centers at major campuses, and resources for fitness and daily life. Families typically receive fertility support, paid parental leave with gradual return, caregiving leave, and dependent-care guidance, while financial perks commonly include stock grants (with purchase discounts), 401(k) matching, and income-protection coverage. Employees also see robust time off, Apple University learning and tuition reimbursement, donation matching and paid volunteer hours, and deep product and partner discounts.
Is this posting expired or inaccurate?
