JobsPrincipal GenAI Inference Optimization Engineer
Principal GenAI Inference Optimization Engineer
AMDPrincipal GenAI Inference Optimization Engineer
AMDLocation
San Jose, CA
Type
Full-time
Posted
5/5/2026
Compensation
USD $210,000.00/Yr. – USD $300,000.00/Yr.
Undergraduate with 5+ Years of Experience
Approval 98.6%·Filings 728·New hires 184·
✓ Established Sponsor
·FY 2025Job description
The Principal GenAI Inference Optimization Engineer will join the Models and Applications team at AMD, focusing on enhancing the performance, efficiency, and scalability of generative AI inference workloads on AMD GPU platforms. This role involves optimizing latency, throughput, and cost efficiency for large-scale model deployment while working across the software-hardware stack. The ideal candidate will have strong technical expertise in GenAI inference optimization and GPU performance. Collaboration with cross-functional teams is essential to drive optimization efforts.
Requirements
- Strong understanding of GPU architecture and performance fundamentals.
- Experience with GenAI inference optimization techniques such as quantization and batching.
- Hands-on experience with inference and serving frameworks like vLLM, SGLang, or Triton.
- Experience working on LLM or multimodal inference workloads.
- Familiarity with distributed systems and serving architectures.
- Experience with ML frameworks such as PyTorch, JAX, or TensorFlow.
- Proficiency in Python and at least one systems language like C++, CUDA, or HIP.
- Experience with profiling, debugging, and performance tuning tools.
Responsibilities
- Optimize performance of GenAI inference workloads on AMD GPU platforms across single-node and distributed environments.
- Improve latency, throughput, and cost efficiency for LLM and multimodal model serving in production.
- Analyze and resolve bottlenecks across compute, memory, and communication.
- Contribute to cross-stack optimizations spanning kernels, runtimes, communication libraries, and inference frameworks.
- Implement and evaluate inference optimization techniques such as batching strategies and quantization.
- Support development and optimization of scalable serving systems, including request scheduling and resource utilization.
- Develop and use profiling, benchmarking, and performance analysis tools for inference workloads.
- Collaborate with hardware, compiler, and framework teams to improve overall system performance.
- Contribute to internal tools and open-source projects for inference optimization on AMD platforms.
- Document best practices and contribute to performance guidelines for GenAI deployment.
Benefits
- AMD provides a competitive 'Total Rewards' package that focuses on financial growth, health, and work-life balance.
Is this posting expired or inaccurate?
