JobsSenior Systems Software Engineer, Kubernetes Scale - DGX Cloud

Senior Systems Software Engineer, Kubernetes Scale - DGX Cloud

NVIDIA

Senior Systems Software Engineer, Kubernetes Scale - DGX Cloud

NVIDIA

Location

Santa Clara, CA, Seattle, WA

Type

Full-time

Posted

6/11/2026

Compensation

$184,000 - $356,500 per year

Undergraduate with 5+ Years of Experience

Approval 99.2%·Filings 1,781·New hires 873·

👑 Elite Sponsor

·FY 2025

Job description

The Senior Systems Software Engineer at NVIDIA will work within the DGX Cloud organization, focusing on optimizing AI infrastructure and enhancing performance for distributed systems. This role requires deep expertise in Kubernetes, open-source technologies, and systems performance. The engineer will collaborate with AI researchers and developers to tackle real-world challenges while driving advancements in accelerated computing. The position emphasizes end-to-end performance characterization and the development of innovative testing frameworks.

Requirements

8+ years of experience in Computer Architecture, Networking, Storage systems, and Accelerators.
Bachelor's or Master's degree in Engineering, preferably in Electrical Engineering, Computer Engineering, or Computer Science.
Expertise in Kubernetes and familiarity with related CNCF projects.
Background in working with large scale parallel and distributed accelerator-based systems.
Expertise optimizing performance and AI workloads on large scale systems.
Experience with performance modeling and benchmarking at scale.
Proficiency in Golang or Python.
Background with the NVIDIA software ecosystem in both training and inference domains.
Expertise with at least one public CSP infrastructure such as GCP, AWS, Azure, or OCI.

Responsibilities

Drive end-to-end performance and scale characterization for the NVIDIA DGX Cloud software stack.
Collaborate with AI researchers, developers, and customers to develop innovative automated tests.
Deep dive into performance and scale issues in complex distributed systems to identify and resolve root causes.
Design and develop monitoring, reporting, and analysis tools for performance and scale testing.
Triage, debug, and root cause issues related to operating Kubernetes clusters at ultra-large scale.
Build and maintain a high-velocity framework for continuous performance and scale testing.
Document research, methodologies, and results clearly and concisely.
Engage efficiently with upstream communities to validate performance and scalability of AI workloads.

Benefits

Employees at NVIDIA are often offered comprehensive, day-one benefits—including medical, dental, and vision coverage with HSA support, life and disability insurance, an Employee Assistance Program, and a 401(k) with auto-enrollment. Many roles also have generous time off and holidays, donation matching (up to $10,000), and a wide menu of extras like FSAs, commuter benefits, legal and identity-theft protection, pet insurance, and wellness discounts. Optional programs can include student-loan and home-purchase support, plus family care resources and expert medical services.

Is this posting expired or inaccurate?