Post your job offer for free on H1BConnect with no upfront cost!

Logo

Hire with Us
Apple Inc. logo

Senior Software Engineer, ML Inference

Apple Inc.

3/14/2025

Cupertino, CA

Full-time

Salary: $175,800 - $312,200 per year


Job Description

Apple Maps is seeking a Senior Software Engineer, ML Inference to optimize and scale machine learning models, focusing on large language models for high-performance, production-scale inference.

Requirements

  • Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience)
  • 5+ years in software engineering focused on ML inference, GPU acceleration, and large-scale systems
  • Expertise in deploying and optimizing LLMs for high-performance, production-scale inference
  • Proficiency in Python, Java or C++
  • Experience with deep learning frameworks like PyTorch, TensorFlow, and Hugging Face Transformers
  • Experience with model serving tools (e.g., NVIDIA Triton, TensorFlow Serving, VLLM, etc)
  • Experience with optimization techniques like Attention Fusion, Quantization, and Speculative Decoding
  • Skilled in GPU optimization (e.g., CUDA, TensorRT-LLM, cuDNN) to accelerate inference tasks
  • Familiarity with cloud technologies like Docker, Kubernetes, AWS EKS for scalable deployment

Responsibilities

  • Optimize LLMs for Inference: Implement and enhance large language models for real-time and batch inference, balancing performance and resource efficiency
  • Advanced Inference Optimization: Apply techniques such as quantization and speculative decoding to reduce model size and accelerate inference without sacrificing accuracy
  • Cross-Functional Collaboration: Partner with data scientists, ML researchers, and infrastructure engineering teams to understand model requirements, provide feedback, and ensure smooth deployment of models into production
  • Monitoring & Resource Management: Implement monitoring tools to profile and track the performance of models running on GPUs, including real-time monitoring of GPU utilization, memory usage, and inference throughput
  • Continuous Improvement & R&D: Stay on top of the latest research in LLM inference techniques, GPU optimizations, and distributed systems to bring innovative improvements to the overall system
Logo

© 2024 H1BConnect. All rights reserved.

Check out our sister site LatamDev for tech jobs in Latin America! 🌎