JobsSystem Software Engineer – Data Center GPU Compute Diagnostics
NVIDIA logo

System Software Engineer – Data Center GPU Compute Diagnostics

NVIDIA

Location

Durham, NC

Type

Full-time

Posted

5/21/2026

Compensation

$152,000 - $241,500 per year

Undergraduate with 2+ Years of Experience
Approval 99.2%·Filings 1,781·New hires 873·
👑 Elite Sponsor
·FY 2025

Job description

We are looking for a system software engineer to develop next-generation Data Center GPU diagnostics for AI supercomputer systems. The role involves creating applications that stress GPU compute engines and memory systems, while collaborating closely with hardware architecture and validation teams. Candidates will work on CUDA kernel diagnostics and contribute to the validation of new hardware features. This position offers the opportunity to grow knowledge in operating systems, computer architecture, and modern AI development tools.

Requirements

  • BS or MS degree in Electrical Engineering, Computer Engineering, Computer Science, or equivalent experience.
  • 5+ years of system software, GPU software, embedded software, or hardware validation experience.
  • Experience writing low-level diagnostics and interacting with device firmware and hardware level debuggers.
  • Strong C/C++ and Python programming skills.
  • Exposure to GPU architecture, CUDA kernels, and GPU compute workloads is strongly preferred.
  • Working knowledge of memory systems, ECC behavior, and DMA engines.
  • Familiarity with GEMM-style workloads.
  • Awareness of voltage/frequency characterization, thermal testing, and power stress concepts.

Responsibilities

  • Work closely with hardware architecture, driver, manufacturing, and field teams throughout the product development lifecycle.
  • Implement and maintain CUDA/C++ diagnostic workloads and software infrastructure for chip development and validation.
  • Write and tune GPU compute tests that stress Tensor Cores, SMs, L2/cache hierarchy, and HBM memory.
  • Implement and tune GEMM-style diagnostic workloads, including tests combined with additional load in NVLink, PCIe, or CPU subsystems.
  • Contribute to higher-level AI workload tests, including PyTorch-based large model workloads.
  • Bring up and validate new hardware features with pre-beta GPU drivers and low-level diagnostic software.
  • Triaging and debugging failures involving ECC, HBM behavior, thermal limits, and PCIe/NVLink errors.

Benefits

  • Employees at NVIDIA are often offered comprehensive, day-one benefits—including medical, dental, and vision coverage with HSA support, life and disability insurance, an Employee Assistance Program, and a 401(k) with auto-enrollment. Many roles also have generous time off and holidays, donation matching (up to $10,000), and a wide menu of extras like FSAs, commuter benefits, legal and identity-theft protection, pet insurance, and wellness discounts. Optional programs can include student-loan and home-purchase support, plus family care resources and expert medical services.

Is this posting expired or inaccurate?