Senior Software Engineer, ML Inference

Apple Inc.

3/14/2025

Cupertino, CA

Full-time

Salary: $175,800 - $312,200 per year

Job Description

Apple Maps is seeking a Senior Software Engineer, ML Inference to optimize and scale machine learning models, focusing on large language models for high-performance, production-scale inference.

Requirements

Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience)
5+ years in software engineering focused on ML inference, GPU acceleration, and large-scale systems
Expertise in deploying and optimizing LLMs for high-performance, production-scale inference
Proficiency in Python, Java or C++
Experience with deep learning frameworks like PyTorch, TensorFlow, and Hugging Face Transformers
Experience with model serving tools (e.g., NVIDIA Triton, TensorFlow Serving, VLLM, etc)
Experience with optimization techniques like Attention Fusion, Quantization, and Speculative Decoding
Skilled in GPU optimization (e.g., CUDA, TensorRT-LLM, cuDNN) to accelerate inference tasks
Familiarity with cloud technologies like Docker, Kubernetes, AWS EKS for scalable deployment

Responsibilities

Optimize LLMs for Inference: Implement and enhance large language models for real-time and batch inference, balancing performance and resource efficiency
Advanced Inference Optimization: Apply techniques such as quantization and speculative decoding to reduce model size and accelerate inference without sacrificing accuracy
Cross-Functional Collaboration: Partner with data scientists, ML researchers, and infrastructure engineering teams to understand model requirements, provide feedback, and ensure smooth deployment of models into production
Monitoring & Resource Management: Implement monitoring tools to profile and track the performance of models running on GPUs, including real-time monitoring of GPU utilization, memory usage, and inference throughput
Continuous Improvement & R&D: Stay on top of the latest research in LLM inference techniques, GPU optimizations, and distributed systems to bring innovative improvements to the overall system