JobsPrincipal Software Engineer, DGX Cloud Production Engineering
Principal Software Engineer, DGX Cloud Production Engineering
NVIDIAPrincipal Software Engineer, DGX Cloud Production Engineering
NVIDIALocation
remote, Santa Clara, CA
Type
Full-time
Posted
5/19/2026
Compensation
$272,000 - $431,250 per year
Undergraduate with 5+ Years of Experience
Approval 99.2%·Filings 1,781·New hires 873·
👑 Elite Sponsor
·FY 2025Job description
NVIDIA is seeking Principal Software Engineers to lead the technical direction for DGX Cloud's GPU infrastructure across various environments. This role focuses on defining architecture, building automation, and ensuring reliability for large-scale GPU clusters. Candidates will mentor engineers and influence multiple teams while tackling complex infrastructure challenges. The position requires a blend of technical expertise and leadership skills to drive innovation in production engineering.
Requirements
- 15+ years of experience building and operating large-scale distributed systems or cloud infrastructure.
- Deep experience with Kubernetes, Linux, infrastructure automation, and production operations.
- Strong programming experience in Go, Python, or similar languages.
- Proven ability to lead complex cross-organizational technical initiatives.
- Experience designing reliable systems with clear SLOs, observability, incident response, and automation.
- BS/MS in Computer Science or equivalent experience.
Responsibilities
- Define and execute the technical strategy for DGX Cloud cluster operations.
- Lead design and implementation of systems for cluster lifecycle, validation, repair, upgrades, observability, and readiness.
- Establish patterns for Kubernetes-based GPU cluster operations across partner and on-prem environments.
- Identify and eliminate operational toil through software, APIs, automation, and agent-assisted workflows.
- Set technical standards for production readiness, SLOs, incident response, handoff gates, and operational acceptance.
- Mentor engineers and influence platform, infrastructure, storage, networking, security, and workload teams.
Benefits
- Employees at NVIDIA are often offered comprehensive, day-one benefits—including medical, dental, and vision coverage with HSA support, life and disability insurance, an Employee Assistance Program, and a 401(k) with auto-enrollment. Many roles also have generous time off and holidays, donation matching (up to $10,000), and a wide menu of extras like FSAs, commuter benefits, legal and identity-theft protection, pet insurance, and wellness discounts. Optional programs can include student-loan and home-purchase support, plus family care resources and expert medical services.
Is this posting expired or inaccurate?
