Post your job offer for free on H1BConnect with no upfront cost!

Logo

Hire with Us
NVIDIA Corporation logo

Senior Site Reliability Engineer

NVIDIA Corporation

4/20/2025

US, CA, Santa Clara

Full-time

Salary: $168,000 - $322,000


Job Description

Join our team as a Senior Site Reliability Engineer at NVIDIA in Santa Clara, CA, USA. Be part of shaping the future of computing and ensuring the smooth operation of brand-new technologies.

Requirements

  • B.S. degree in Computer Science or related technical field (or equivalent experience) with over 10 years in building and supporting critical services
  • Proficiency in Kubernetes administration, modern CI/CD techniques and Infrastructure as Code (IaC)
  • Deep understanding of Linux operating systems and TCP/IP fundamentals
  • Expertise with at least one major cloud service provider - AWS, GCP, Azure
  • Demonstrated proficiency with end-to-end SRE capabilities and observability
  • Proficient in monitoring, metrics gathering, APM, container management, and log collection tools
  • 5+ years of coding/scripting experience in at least two high-level programming languages such as Python, Go, Ruby, or Groovy
  • Creative problem solver with excellent debugging skills and great communication and documentation abilities

Responsibilities

  • Own the solutions you build, collaborating with cross-functional teams to successfully implement them
  • Collaborate with various teams in a fast-paced environment to ensure seamless project completion
  • Continuously improve solution provisioning and management through automation
  • Identify areas to improve service resiliency using industry-standard practices
  • Detect performance issues and recommend solutions to maintain world-class service quality
  • Conduct capacity management and planning to meet ongoing operational needs
  • Participate in incident reviews, assist in root cause identification, and write RCA reports
  • Deliver SRE solutions in a globally distributed, multi-cloud hybrid environment - AWS, GCP, and On-prem
  • Ensure the highest level of uptime and Quality of Service (QoS) for internal customers through operational excellence
  • Participate in the team's on-call rotation

Benefits

  • Multiple relocation packages
  • Two weeklong shutdowns (mid-summer and year-end) in the US (in addition to PTO)
  • 8-week parental leave
  • 9 Employee Resource Groups
  • Annual bonus offering
  • Flexible work arrangements
  • Up to 6% 401K matching
Logo

© 2024 H1BConnect. All rights reserved.

Check out our sister site LatamDev for tech jobs in Latin America! 🌎