JobsPrincipal GenAI Inference Optimization Engineer

Principal GenAI Inference Optimization Engineer

AMD

Principal GenAI Inference Optimization Engineer

AMD

Location

San Jose, CA

Type

Full-time

Posted

5/5/2026

Compensation

USD $210,000.00/Yr. – USD $300,000.00/Yr.

Undergraduate with 5+ Years of Experience

Approval 98.6%·Filings 728·New hires 184·

✓ Established Sponsor

·FY 2025

Job description

The Principal GenAI Inference Optimization Engineer will join the Models and Applications team at AMD, focusing on enhancing the performance, efficiency, and scalability of generative AI inference workloads on AMD GPU platforms. This role involves optimizing latency, throughput, and cost efficiency for large-scale model deployment while working across the software-hardware stack. The ideal candidate will have strong technical expertise in GenAI inference optimization and GPU performance. Collaboration with cross-functional teams is essential to drive optimization efforts.

Requirements

Strong understanding of GPU architecture and performance fundamentals.
Experience with GenAI inference optimization techniques such as quantization and batching.
Hands-on experience with inference and serving frameworks like vLLM, SGLang, or Triton.
Experience working on LLM or multimodal inference workloads.
Familiarity with distributed systems and serving architectures.
Experience with ML frameworks such as PyTorch, JAX, or TensorFlow.
Proficiency in Python and at least one systems language like C++, CUDA, or HIP.
Experience with profiling, debugging, and performance tuning tools.

Responsibilities

Optimize performance of GenAI inference workloads on AMD GPU platforms across single-node and distributed environments.
Improve latency, throughput, and cost efficiency for LLM and multimodal model serving in production.
Analyze and resolve bottlenecks across compute, memory, and communication.
Contribute to cross-stack optimizations spanning kernels, runtimes, communication libraries, and inference frameworks.
Implement and evaluate inference optimization techniques such as batching strategies and quantization.
Support development and optimization of scalable serving systems, including request scheduling and resource utilization.
Develop and use profiling, benchmarking, and performance analysis tools for inference workloads.
Collaborate with hardware, compiler, and framework teams to improve overall system performance.
Contribute to internal tools and open-source projects for inference optimization on AMD platforms.
Document best practices and contribute to performance guidelines for GenAI deployment.

Benefits

AMD provides a competitive 'Total Rewards' package that focuses on financial growth, health, and work-life balance.

Is this posting expired or inaccurate?