JobsSoftware Engineer I - AI/ML, AWS Neuron Distributed Training
Software Engineer I - AI/ML, AWS Neuron Distributed Training
AmazonSoftware Engineer I - AI/ML, AWS Neuron Distributed Training
AmazonLocation
Cupertino, CA
Type
Full-time
Posted
6/3/2026
Compensation
$127,100 - $185,000 per year
Undergraduate Entry-Level
Approval 98.6%·Filings 19,451·New hires 10,113·
👑 Elite Sponsor
·FY 2025Job description
The Senior Software Engineer will join the ML Distributed Training team at Annapurna Labs, focusing on the development and optimization of large-scale machine learning model training. This role involves working with diverse model families, including LLMs and multimodal models, while collaborating with various engineering teams to deliver efficient solutions on AWS Trainium systems. The team is part of AWS, which integrates silicon and software to tackle complex technical challenges. The position emphasizes performance optimization and the use of advanced training techniques.
Requirements
- Bachelor's degree or above in computer science, computer engineering, or a related field.
- 1+ years of programming experience with at least one software programming language.
- Experience with software development practices including code reviews, source control, testing, and build processes.
- Experience with machine learning concepts and at least one ML framework such as PyTorch, JAX, or TensorFlow.
- Experience with large-scale distributed training or LLM workloads.
- Experience with computer architecture or hardware-software co-optimization.
- Experience with distributed systems, libraries, or frameworks.
- Familiarity with end-to-end model training pipelines.
- Previous internship or research experience in ML infrastructure or systems software.
Responsibilities
- Contribute to the design and implementation of distributed training solutions for large-scale ML models running on Trainium instances.
- Extend and optimize popular distributed training frameworks including FSDP, torchtitan, and Hugging Face libraries for the Neuron ecosystem.
- Develop and optimize mixed-precision and low-precision training techniques using BF16, FP8, and emerging numerical formats.
- Implement precision-aware training strategies, loss scaling techniques, and careful gradient management to ensure training stability.
- Profile, analyze, and tune end-to-end training pipelines to achieve optimal performance on Trainium hardware.
- Collaborate with hardware, compiler, and runtime teams to understand system constraints and unlock new capabilities.
- Support the deployment and optimization of training workloads at scale in collaboration with AWS solution architects and customers.
Benefits
- Employees at Amazon are often offered comprehensive health benefits—including multiple medical plan options (no pre-existing condition exclusions, 100% covered in-network preventive care), dental and vision plans, a 24/7 medical advice line from day one, expert second-opinion services, and broad mental-health support with several free counseling sessions (including pediatric). Financial wellness typically includes a 401(k) with company match (up to 2%), Restricted Stock Units (equity), FSAs, an emergency savings program, product and partner discounts, and even college-savings and home-purchase programs. Overall, the package is designed to support employees and their families’ health, finances, and day-to-day life.
Is this posting expired or inaccurate?
