Principal / Senior GPU Software Performance Engineer — Post‑Training
AMD San Jose, CA Full-time 3/17/2026 $226.4k - $339.6k per year
Master's Entry-Level
Approval 98.6%•Total filings 728•New hires 184•
✓ Established Sponsor
•FY 2025Job Description
The Principal/Senior GPU Software Performance Engineer at AMD will focus on enhancing the performance of post-training workloads on AMD Instinct GPUs. The role involves optimizing various components of training pipelines, collaborating with multiple teams, and ensuring reproducibility and efficiency in deep learning processes.
Requirements
- Proven GPU performance engineering for deep learning (ROCm/HIP, Triton, or similar)
- Hands-on with SFT, LoRA, and RL-based training at scale
- Strong PyTorch experience (torch.distributed, FSDP/ZeRO or equivalent)
- Proficient in Python and C++; comfortable reading/writing kernels when needed
- Experience with distributed systems and collective communication libraries
- Track record of turning profiles into fixes, upstreaming changes, and documenting results
Responsibilities
- Lead performance for finetuning and RL training solutions on AMD GPUs
- Improve throughput, memory efficiency, and stability across data, model, and optimizer steps
- Optimize multi-GPU/multi-node training and communication patterns
- Contribute efficient kernels/ops and targeted graph-level optimizations
- Profile, diagnose, and resolve bottlenecks using standard tooling; prevent regressions in CI
- Ship reproducible pipelines and documentation adopted by internal teams and external developers
- Collaborate with framework, compiler, and model teams to land durable improvements
Benefits
- AMD provides a competitive 'Total Rewards' package that focuses on financial growth, health, and work-life balance.
Is this job posting expired or no longer available?
