Post your job offer for free on H1BConnect with no upfront cost!

Logo

Hire with Us
NVIDIA Corporation logo

Distinguished Engineer, AI Resiliency

NVIDIA Corporation

7/13/2025

US, CA, Santa Clara

Full-time

Salary: $308,000 - $471,500 per year


Job Description

NVIDIA is seeking a Distinguished Engineer for AI Resiliency to architect, design, and develop world-class software resiliency features for training groundbreaking AI models on AI superclusters.

Requirements

  • Master’s or Ph.D. in Computer Science, Electrical or Computer Engineering
  • 15+ years of experience in software architecture or related fields with a deep understanding of AI-optimized systems
  • 5+ years of hands-on experience in software development on high-complexity projects involving HPC or AI
  • Proven experience with large-scale AI supercomputing applications
  • Experience in implementing HPC software development best practices in large-scale systems

Responsibilities

  • Define scalable software architecture for single-job resilient training on hundreds of thousands of GPUs with minimal downtime
  • Design and deliver modular, resilient software features to support large-scale AI training
  • Innovate and evolve resilient architecture designs to achieve stringent uptime requirements
  • Collaborate closely with internal partners and communicate progress updates to senior leadership

Benefits

  • Multiple relocation packages
  • Two weeklong shutdowns (mid-summer and year-end) in the US (in addition to PTO)
  • 8-week parental leave
  • 9 Employee Resource Groups
  • Annual bonus offering
  • Flexible work arrangements
  • Up to 6% 401K matching
Logo

© 2024 H1BConnect. All rights reserved.

Check out our sister site LatamDev for tech jobs in Latin America! 🌎