JobsSenior Site Reliability Engineer
NVIDIA logo

Senior Site Reliability Engineer

NVIDIA

Location

Santa Clara, CA

Type

Full-time

Posted

6/10/2026

Compensation

$148,000 - $276,000 per year

Undergraduate with 5+ Years of Experience
Approval 99.2%·Filings 1,781·New hires 873·
👑 Elite Sponsor
·FY 2025

Job description

NVIDIA is seeking a seasoned Site Reliability Engineer (SRE) to join its Infrastructure, Planning and Processes organization. The role involves developing and maintaining NVIDIA's internal Jenkins-based CI/CD product for GPUs and Tegra systems. The SRE will collaborate with various teams to ensure infrastructure reliability and performance while managing on-premises engineering cloud across multiple data centers. This position requires keen attention to detail and strong problem-solving abilities.

Requirements

  • 5+ years of demonstrable experience in maintaining cloud infrastructure and highly-available production environments.
  • Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent experience.
  • Experience handling and maintaining systems installed in on-premises data centers.
  • Strong hands-on proficiency using BMC interfaces (Redfish), KVM, and IPMI tools for hardware provisioning.
  • Solid understanding of networking principles and protocols, including TCP/IP, DNS, DHCP, and VLANs.
  • Practical experience in working with data analytics and visualization tools such as Kibana, Grafana, or Splunk.
  • Strong demonstrable experience in automation tools like Jenkins and/or Temporal along with configuration tools like Ansible.
  • Proficiency with Kubernetes, Docker, and virtualization technologies.

Responsibilities

  • Manage NVIDIA's on-prem infrastructure and maintain uptime, reliability, and readiness.
  • Guard service level agreements (SLAs) for critical engineering services.
  • Deploy, configure, and manage applications and services on Kubernetes clusters.
  • Help in capacity planning, optimization, and better utilization efforts.
  • Support user-reported issues and monitor alerts to take necessary action.
  • Drive automation of monitoring to gain more insight into applications and system health.
  • Perform root cause analysis and post-mortems of incidents for any threshold breaches.

Benefits

  • Employees at NVIDIA are often offered comprehensive, day-one benefits—including medical, dental, and vision coverage with HSA support, life and disability insurance, an Employee Assistance Program, and a 401(k) with auto-enrollment. Many roles also have generous time off and holidays, donation matching (up to $10,000), and a wide menu of extras like FSAs, commuter benefits, legal and identity-theft protection, pet insurance, and wellness discounts. Optional programs can include student-loan and home-purchase support, plus family care resources and expert medical services.

Is this posting expired or inaccurate?