Post your job offer for free on H1BConnect with no upfront cost!

Logo

Hire with Us
NVIDIA Corporation logo

Datacenter Resiliency Architect - New College Grad 2025

NVIDIA Corporation

7/13/2025

US, CA, Santa Clara

Full-time

Salary: $120,000 - $235,750 per year


Job Description

NVIDIA is seeking a Resiliency Architect to support the development and validation of GPU hardware and software resiliency features in the field of artificial intelligence and high-performance computing.

Requirements

  • Pursuing or recently completed a Master’s or PhD degree in Computer Engineering, Electrical Engineering or closely related degree or equivalent experience.
  • Familiarity with GPU and Networking Architectures, Computer Architecture basics, Machine Learning/Deep Learning concepts.
  • Proficiency in RAS concepts and in developing Architecture models.
  • Scripting and automation with Python or similar.
  • Proficiency in C/C++.
  • Excellent interpersonal skills and ability to collaborate with on-site and remote teams.
  • Strong debugging and analytical skills.
  • Self-driven and results oriented.

Responsibilities

  • Architect hardware and software Resiliency features to improve system Reliability, Availability, Serviceability (RAS), and performance in the Datacenter.
  • Model and analyze RAS metrics like Failures in Time for permanent and transient errors, and Availability from GPU to Rack to Datacenter.
  • Collaborate with architects, unit designers and software engineers to ensure alignment of verification requirements.
  • Develop and implement comprehensive architecture verification testplans for resiliency features.
  • Support test debug on RTL, emulation, and silicon.
  • Develop CUDA software diagnostics kernels for to run on clusters of NVIDIA GPUs and identify potential hardware issues.
  • Develop and automate fault models to simulate various fault types in gate-level netlist, RTL, architectural model, silicon and other environments.

Benefits

  • Multiple relocation packages
  • Two weeklong shutdowns (mid-summer and year-end) in the US (in addition to PTO)
  • 8-week parental leave
  • 9 Employee Resource Groups
  • Annual bonus offering
  • Flexible work arrangements
  • Up to 6% 401K matching
Logo

© 2024 H1BConnect. All rights reserved.

Check out our sister site LatamDev for tech jobs in Latin America! 🌎