JobsDirector, Engineering Operations and Site Reliability Engineering - Datacenter Server Systems
Director, Engineering Operations and Site Reliability Engineering - Datacenter Server Systems
NVIDIADirector, Engineering Operations and Site Reliability Engineering - Datacenter Server Systems
NVIDIALocation
Santa Clara, CA
Type
Full-time
Posted
6/27/2026
Compensation
$292,000 - $442,750 per year
Undergraduate with 5+ Years of Experience
Approval 99.2%·Filings 1,781·New hires 873·
👑 Elite Sponsor
·FY 2025Job description
NVIDIA is seeking a strong technology leader for Engineering Operations and Site Reliability Engineering focused on next-generation datacenter server systems. This role emphasizes execution, reliability, automation, and large-scale system operations to maintain the health and availability of NVIDIA's internal infrastructure. The ideal candidate will lead teams to ensure operational excellence and drive improvements in product quality and serviceability. This position requires a blend of technical leadership and team development in a dynamic environment.
Requirements
- BS or MS in Computer Science, Electrical Engineering, Computer Engineering, or related field.
- 12+ years of experience in infrastructure, systems engineering, reliability, datacenter operations, or distributed systems.
- 7+ years of people management experience.
- Strong understanding of server systems, Linux, cluster operations, and high-speed networking.
- Experience operating complex systems with high availability expectations.
- Proven track record of driving execution across multiple teams and technical domains.
- Clear written and verbal communication skills, including executive-level reporting.
Responsibilities
- Lead teams to ensure NVIDIA's internal rack-scale server systems and clusters remain available and reliable.
- Drive execution across fleet operations, incident response, and change management.
- Build automation, telemetry, alerting, and dashboards to improve visibility and issue resolution.
- Partner with various teams to deploy, sustain, and debug complex systems.
- Create feedback loops to improve product quality and development velocity.
- Grow and mentor a high-performing technical team with a culture of ownership and learning.
Benefits
- Employees at NVIDIA are often offered comprehensive, day-one benefits—including medical, dental, and vision coverage with HSA support, life and disability insurance, an Employee Assistance Program, and a 401(k) with auto-enrollment. Many roles also have generous time off and holidays, donation matching (up to $10,000), and a wide menu of extras like FSAs, commuter benefits, legal and identity-theft protection, pet insurance, and wellness discounts. Optional programs can include student-loan and home-purchase support, plus family care resources and expert medical services.
Is this posting expired or inaccurate?
