Post your job offer for free on H1BConnect with no upfront cost!

Logo

Hire with Us
NVIDIA Corporation logo

Senior Site Reliability Engineer - Observability and Telemetry Platform

NVIDIA Corporation

8/2/2025

US, CA, Santa Clara

Full-time

Salary: $144,000 - $230,000 per year


Job Description

NVIDIA is seeking a Site Reliability Engineer (SRE) to design, build, and maintain large scale production systems with high efficiency and availability using software and systems engineering practices.

Requirements

  • BS degree in Computer Science or related technical field involving coding
  • 5+ years of experience with Infrastructure automation and distributed systems design
  • In-depth knowledge of Linux, Networking, and Containers
  • Experience in Python, Go, Perl, or Ruby
  • Experience with large scale private or public cloud systems based on Kubernetes, OpenStack, and Docker

Responsibilities

  • Design, implement, and support operational aspects of large scale Observability & Telemetry collection platform
  • Engage in the whole lifecycle of services from design to refinement
  • Support services before and after deployment by measuring and monitoring system health
  • Scale systems sustainably through automation and advocate for changes that improve reliability and velocity
  • Practice sustainable incident response and conduct blameless postmortems
  • Participate in on-call rotation to support production systems

Benefits

  • Multiple relocation packages
  • Two weeklong shutdowns (mid-summer and year-end) in the US (in addition to PTO)
  • 8-week parental leave
  • 9 Employee Resource Groups
  • Annual bonus offering
  • Flexible work arrangements
  • Up to 6% 401K matching
Logo

© 2024 H1BConnect. All rights reserved.

Check out our sister site LatamDev for tech jobs in Latin America! 🌎