Post your job offer for free on H1BConnect with no upfront cost!

Logo

Hire with Us
NVIDIA Corporation logo

Senior Software Engineer, AI Resiliency

NVIDIA Corporation

7/21/2025

US, CA, Santa Clara

Full-time

Salary: $184,000 - $287,500 per year


Job Description

NVIDIA is seeking a Senior Software Engineer for AI Resiliency to lead the development of critical resiliency features for AI supercomputers.

Requirements

  • Bachelor’s, Master’s or PhD in Computer Science, Electrical Engineering, or related field
  • Proficiency in C++ and Python
  • 6+ years of relevant experience
  • Strong understanding of distributed systems concepts, parallel programming, and fault tolerance
  • Experience with AI frameworks such as PyTorch, JAX/XLA, TensorFlow
  • Experience with debugging and profiling tools
  • Excellent problem-solving skills

Responsibilities

  • Develop and optimize AI software resiliency features
  • Contribute to large-scale distributed systems with high-quality code
  • Work on AI system error handling and fault tolerance
  • Collaborate with other teams to integrate resiliency features into AI frameworks
  • Develop tests and automation for robustness and efficiency
  • Support production deployments and performance tuning

Benefits

  • Multiple relocation packages
  • Two weeklong shutdowns (mid-summer and year-end) in the US (in addition to PTO)
  • 8-week parental leave
  • 9 Employee Resource Groups
  • Annual bonus offering
  • Flexible work arrangements
  • Up to 6% 401K matching
Logo

© 2024 H1BConnect. All rights reserved.

Check out our sister site LatamDev for tech jobs in Latin America! 🌎