JobsPrincipal Software Engineer, DGX Cloud Production Engineering
NVIDIA logo

Principal Software Engineer, DGX Cloud Production Engineering

NVIDIA

Location

remote, Santa Clara, CA

Type

Full-time

Posted

5/19/2026

Compensation

$272,000 - $431,250 per year

Undergraduate with 5+ Years of Experience
Approval 99.2%·Filings 1,781·New hires 873·
👑 Elite Sponsor
·FY 2025

Job description

NVIDIA is seeking Principal Software Engineers to lead the technical direction for DGX Cloud's GPU infrastructure across various environments. This role focuses on defining architecture, building automation, and ensuring reliability for large-scale GPU clusters. Candidates will mentor engineers and influence multiple teams while tackling complex infrastructure challenges. The position requires a blend of technical expertise and leadership skills to drive innovation in production engineering.

Requirements

  • 15+ years of experience building and operating large-scale distributed systems or cloud infrastructure.
  • Deep experience with Kubernetes, Linux, infrastructure automation, and production operations.
  • Strong programming experience in Go, Python, or similar languages.
  • Proven ability to lead complex cross-organizational technical initiatives.
  • Experience designing reliable systems with clear SLOs, observability, incident response, and automation.
  • BS/MS in Computer Science or equivalent experience.

Responsibilities

  • Define and execute the technical strategy for DGX Cloud cluster operations.
  • Lead design and implementation of systems for cluster lifecycle, validation, repair, upgrades, observability, and readiness.
  • Establish patterns for Kubernetes-based GPU cluster operations across partner and on-prem environments.
  • Identify and eliminate operational toil through software, APIs, automation, and agent-assisted workflows.
  • Set technical standards for production readiness, SLOs, incident response, handoff gates, and operational acceptance.
  • Mentor engineers and influence platform, infrastructure, storage, networking, security, and workload teams.

Benefits

  • Employees at NVIDIA are often offered comprehensive, day-one benefits—including medical, dental, and vision coverage with HSA support, life and disability insurance, an Employee Assistance Program, and a 401(k) with auto-enrollment. Many roles also have generous time off and holidays, donation matching (up to $10,000), and a wide menu of extras like FSAs, commuter benefits, legal and identity-theft protection, pet insurance, and wellness discounts. Optional programs can include student-loan and home-purchase support, plus family care resources and expert medical services.

Is this posting expired or inaccurate?