Post your job offer for free on H1BConnect with no upfront cost!

Logo

Hire with Us
NVIDIA Corporation logo

Senior Research Engineer, Foundation Model Training Infrastructure

NVIDIA Corporation

2/28/2025

US, CA, Santa Clara

Full-time

Salary: $224,000 - $356,500 per year


Job Description

NVIDIA is seeking a senior or principal engineer to work on building cutting-edge infrastructure for large-scale foundation model training in the Generalist Embodied Agent Research (GEAR) group as part of Project GR00T, NVIDIA's moonshot initiative at building foundation models and full-stack technology for humanoid robots.

Requirements

  • Bachelor's degree in Computer Science, Robotics, Engineering, or a related field
  • 10+ years of full-time industry experience in large-scale MLOps and AI infrastructure
  • Proven experience designing and optimizing distributed training systems with frameworks like PyTorch, JAX, or TensorFlow
  • Deep understanding of GPU acceleration, CUDA programming, and cluster management tools like Kubernetes
  • Strong programming skills in Python and a high-performance language such as C++ for efficient system development
  • Strong experience with large-scale GPU clusters, HPC environments, and job scheduling/orchestration tools (e.g., SLURM, Kubernetes)

Responsibilities

  • Design and maintain large-scale distributed training systems to support multi-modal foundation models for robotics
  • Optimize GPU and cluster utilization for efficient model training and fine-tuning on massive datasets
  • Implement scalable data loaders and preprocessors tailored for multimodal datasets, including videos, text, and sensor data
  • Develop robust monitoring and debugging tools to ensure the reliability and performance of training workflows on large GPU clusters
  • Collaborate with researchers to integrate cutting-edge model architectures into scalable training pipelines

Benefits

  • Multiple relocation packages
  • Two weeklong shutdowns (mid-summer and year-end) in the US (in addition to PTO)
  • 8-week parental leave
  • 9 Employee Resource Groups
  • Annual bonus offering
  • Flexible work arrangements
  • Up to 6% 401K matching
Logo

© 2024 H1BConnect. All rights reserved.

Check out our sister site LatamDev for tech jobs in Latin America! 🌎