H1BConnect Pro is launching with premium alerts and access to more job postings.Get early access
NVIDIA logo

Senior Software Engineer, Distributed Systems Engineer - DGX Cloud

NVIDIA
Austin, TX Full-time 11/26/2025 $168k - $333.5k per year
Undergraduate with 5+ Years of Experience

Job Description

NVIDIA is seeking experienced software engineers with Kubernetes expertise to enhance its AI infrastructure, focusing on cluster operations, operator development, and GPU resource scheduling.

Requirements

  • 5+ years in a software engineering role with impact demonstrated in a technical organization
  • Experience with Kubernetes APIs and frameworks, not just operating a cluster
  • Technical knowledge in systems programming languages (Go, Python)
  • Solid understanding of data structures and algorithms
  • BS in Computer Science, Engineering, Physics, Mathematics or equivalent experience
  • Strong communication skills and ability to work with multi-functional teams

Responsibilities

  • Part of the DGX Cloud team responsible for production systems enabling large scalable GPU clusters for AI workloads
  • Implementing monitoring and health management for reliability, availability, and scalability of GPU assets
  • Evaluating system failures and improving services based on incident management processes
  • Collaborating with teams across NVIDIA to ensure optimal performance of production AI clusters