Austin, TX Full-time 11/26/2025 $168k - $333.5k per year
Undergraduate with 5+ Years of Experience
Job Description
NVIDIA is seeking experienced software engineers with Kubernetes expertise to enhance its AI infrastructure, focusing on cluster operations, operator development, and GPU resource scheduling.
Requirements
5+ years in a software engineering role with impact demonstrated in a technical organization
Experience with Kubernetes APIs and frameworks, not just operating a cluster
Technical knowledge in systems programming languages (Go, Python)
Solid understanding of data structures and algorithms
BS in Computer Science, Engineering, Physics, Mathematics or equivalent experience
Strong communication skills and ability to work with multi-functional teams
Responsibilities
Part of the DGX Cloud team responsible for production systems enabling large scalable GPU clusters for AI workloads
Implementing monitoring and health management for reliability, availability, and scalability of GPU assets
Evaluating system failures and improving services based on incident management processes
Collaborating with teams across NVIDIA to ensure optimal performance of production AI clusters