JobsPrincipal High-Performance LLM Training Engineer
Job description
NVIDIA is looking for a Principal Engineer to enhance the performance of large-scale AI training and post-training workloads across its hardware and software stack. This role involves optimizing frontier-scale LLM workloads on thousands of GPUs and influencing future GPU and software roadmaps. The ideal candidate will have a strong technical background and the ability to operate across various abstraction layers. Success in this position requires both direct performance improvements and setting technical direction within the organization.
Requirements
- A MS or PhD in Computer Science, Electrical Engineering, Computer Engineering, or a related field, with 12+ years of relevant work or research experience.
- Demonstrated principal-level technical impact in areas such as large-scale AI training systems, GPU performance optimization, or distributed systems.
- Deep hands-on experience analyzing and optimizing performance of large-scale deep learning workloads, especially transformer-based models.
- Strong understanding of GPU and AI accelerator architecture from individual accelerators to datacenter-scale systems.
- Experience with distributed training techniques such as data parallelism and mixed precision training.
- A strong track record of using profiling, tracing, benchmarking, and performance modeling tools.
Responsibilities
- Lead end-to-end performance analysis and optimization of innovative LLM pre-training and post-training workloads.
- Drive workloads closer to speed-of-light performance by identifying and removing bottlenecks.
- Develop production-quality software, tools, models, benchmarks, and analysis infrastructure.
- Build and refine performance models, workload characterizations, and simulation methodologies.
- Serve as a technical authority for AI training performance, partnering with various teams.
- Translate workload insights into hardware and software recommendations.
- Mentor and provide technical leadership to engineers across the organization.
Benefits
- Employees at NVIDIA are often offered comprehensive, day-one benefits—including medical, dental, and vision coverage with HSA support, life and disability insurance, an Employee Assistance Program, and a 401(k) with auto-enrollment. Many roles also have generous time off and holidays, donation matching (up to $10,000), and a wide menu of extras like FSAs, commuter benefits, legal and identity-theft protection, pet insurance, and wellness discounts. Optional programs can include student-loan and home-purchase support, plus family care resources and expert medical services.
Is this posting expired or inaccurate?
