JobsAIML - Staff ML Infrastructure Engineer, ML Platform & Technology - Pre-training Infrastructure
AIML - Staff ML Infrastructure Engineer, ML Platform & Technology - Pre-training Infrastructure
AppleAIML - Staff ML Infrastructure Engineer, ML Platform & Technology - Pre-training Infrastructure
AppleLocation
San Francisco Bay Area, CA
Type
Full-time
Posted
5/15/2026
Compensation
Not listed
Undergraduate with 5+ Years of Experience
Approval 98.9%·Filings 5,543·New hires 2,691·
👑 Elite Sponsor
·FY 2025Job description
As an engineer on the ML Compute team at Apple, you will focus on driving large-scale pre-training initiatives for cutting-edge foundation models. Your work will emphasize resiliency, efficiency, scalability, and resource optimization. You will collaborate with cross-functional engineers to tackle large-scale ML training challenges and mentor fellow engineers to foster skill growth. This role requires a strong background in distributed systems and cloud computing to enhance system performance and maintainability.
Requirements
- Bachelor's degree in Computer Science, engineering, or a related field
- 6+ years of hands-on experience in building scalable backend systems for training and evaluation of machine learning models
- Proficient in relevant programming languages, like Python or Go
- Strong expertise in distributed systems, reliability and scalability, containerization, and cloud platforms
- Proficient in cloud computing infrastructure and tools: Kubernetes, Ray, PySpark
- Advance degrees in Computer Science, engineering, or a related field
- Proficient in working with and debugging accelerators, like GPU, TPU, AWS Trainium
- Proficient in ML training and deployment frameworks, like JAX, Tensorflow, PyTorch, TensorRT, vLLM
Responsibilities
- Drive large-scale pre-training initiatives to support cutting-edge foundation models.
- Enhance distributed training techniques for foundation models.
- Research and implement new patterns and technologies to improve system performance, maintainability, and design.
- Optimize execution and performance of workloads built with JAX, PyTorch, XLA and CUDA on large distributed systems.
- Leverage high-performance networking technologies such as NCCL for GPU collectives and TPU interconnect for large-scale distributed training.
- Architect a robust MLOps platform to streamline and automate pretraining operations.
- Operationalize large-scale ML workloads on Kubernetes, ensuring distributed trainings are robust, efficient, and fault-tolerant.
- Lead complex technical projects, defining requirements and tracking progress with team members.
- Collaborate with cross-functional engineers to solve large-scale ML training challenges.
- Mentor engineers in areas of your expertise, fostering skill growth and knowledge sharing.
- Cultivate a team centered on collaboration, technical excellence, and innovation.
Benefits
- Employees at Apple are often offered comprehensive benefits that support physical and mental well-being—flexible medical plans, confidential counseling, onsite wellness centers at major campuses, and resources for fitness and daily life. Families typically receive fertility support, paid parental leave with gradual return, caregiving leave, and dependent-care guidance, while financial perks commonly include stock grants (with purchase discounts), 401(k) matching, and income-protection coverage. Employees also see robust time off, Apple University learning and tuition reimbursement, donation matching and paid volunteer hours, and deep product and partner discounts.
Is this posting expired or inaccurate?
