JobsSenior Site Reliability Engineer
Job description
NVIDIA is seeking a seasoned Site Reliability Engineer (SRE) to join its Infrastructure, Planning and Processes organization. The role involves developing and maintaining NVIDIA's internal Jenkins-based CI/CD product for GPUs and Tegra systems. The SRE will collaborate with various teams to ensure infrastructure reliability and performance while managing on-premises engineering cloud across multiple data centers. This position requires keen attention to detail and strong problem-solving abilities.
Requirements
- 5+ years of demonstrable experience in maintaining cloud infrastructure and highly-available production environments.
- Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent experience.
- Experience handling and maintaining systems installed in on-premises data centers.
- Strong hands-on proficiency using BMC interfaces (Redfish), KVM, and IPMI tools for hardware provisioning.
- Solid understanding of networking principles and protocols, including TCP/IP, DNS, DHCP, and VLANs.
- Practical experience in working with data analytics and visualization tools such as Kibana, Grafana, or Splunk.
- Strong demonstrable experience in automation tools like Jenkins and/or Temporal along with configuration tools like Ansible.
- Proficiency with Kubernetes, Docker, and virtualization technologies.
Responsibilities
- Manage NVIDIA's on-prem infrastructure and maintain uptime, reliability, and readiness.
- Guard service level agreements (SLAs) for critical engineering services.
- Deploy, configure, and manage applications and services on Kubernetes clusters.
- Help in capacity planning, optimization, and better utilization efforts.
- Support user-reported issues and monitor alerts to take necessary action.
- Drive automation of monitoring to gain more insight into applications and system health.
- Perform root cause analysis and post-mortems of incidents for any threshold breaches.
Benefits
- Employees at NVIDIA are often offered comprehensive, day-one benefits—including medical, dental, and vision coverage with HSA support, life and disability insurance, an Employee Assistance Program, and a 401(k) with auto-enrollment. Many roles also have generous time off and holidays, donation matching (up to $10,000), and a wide menu of extras like FSAs, commuter benefits, legal and identity-theft protection, pet insurance, and wellness discounts. Optional programs can include student-loan and home-purchase support, plus family care resources and expert medical services.
Is this posting expired or inaccurate?
