JobsSenior Platform and EngOps Engineer - Cluster Operations
NVIDIA logo

Senior Platform and EngOps Engineer - Cluster Operations

NVIDIA

Location

Santa Clara, CA

Type

Full-time

Posted

5/10/2026

Compensation

$176,000 - $333,500 per year

Undergraduate with 5+ Years of Experience
Approval 99.2%·Filings 1,781·New hires 873·
👑 Elite Sponsor
·FY 2025

Job description

NVIDIA is seeking EngOps and Platform Engineers to enhance execution efficiency while managing large GPU clusters interconnected via NVLink and InfiniBand. The team focuses on developing automated tools for deploying and maintaining these clusters, ensuring optimal performance and availability. Candidates will collaborate with engineering and product teams across multiple time zones to align operations with project requirements. This role emphasizes troubleshooting and maintaining seamless operations in high-performance computing environments.

Requirements

  • BS or MS in Computer Science, Computer Engineering, Electrical Engineering, or a related field, or equivalent experience.
  • 8+ years of hands-on experience in deploying and administrating clusters, servers, switches, and related infrastructure.
  • Expertise in automation with skills in Ansible, Python, and Shell Scripting.
  • Deep understanding of operating systems, computer networks, and high-performance applications.
  • Proven ability to work effectively with developers and test engineers across different teams and time zones.
  • Proficiency with Linux fundamentals.

Responsibilities

  • Develop automated tools to efficiently deploy, provision, and maintain extensive GPU clusters interconnected via NVLink and InfiniBand.
  • Implement modern DevOps tools to automate software updates, perform maintenance tasks, and monitor cluster availability.
  • Take ownership of daily cluster failures and issues, troubleshooting them promptly to maintain optimal cluster availability and performance.
  • Manage the rollout and rollback of cluster software and firmware updates, ensuring smooth transitions and minimal disruptions.
  • Collaborate effectively with dynamic Engineering and Product Teams across multiple time zones to align cluster operations with evolving project requirements.

Benefits

  • Employees at NVIDIA are often offered comprehensive, day-one benefits—including medical, dental, and vision coverage with HSA support, life and disability insurance, an Employee Assistance Program, and a 401(k) with auto-enrollment. Many roles also have generous time off and holidays, donation matching (up to $10,000), and a wide menu of extras like FSAs, commuter benefits, legal and identity-theft protection, pet insurance, and wellness discounts. Optional programs can include student-loan and home-purchase support, plus family care resources and expert medical services.

Is this posting expired or inaccurate?