Post your job offer for free on H1BConnect with no upfront cost!

Logo

Hire with Us
NVIDIA Corporation logo

Senior Observability Architect, AI and HPC

NVIDIA Corporation

2/28/2025

US, CA, Santa Clara

Full-time

Salary: $224,000 - $425,500 per year


Job Description

NVIDIA is seeking a Senior or Principal Data and Observability Architect to define a vision and roadmap for distributed observability systems for large-scale AI and HPC clusters.

Requirements

  • Experience designing and building large scale, distributed observability systems
  • Ability to collaborate with data scientists, researchers, and engineering teams
  • Experience with turning raw data into actionable reports
  • Technical lead level Python programming experience and use of API calls
  • Passion for improving the productivity of others
  • MS (preferred) or BS in Computer Science, Electrical Engineering, or related field
  • 12+ years of relevant experience

Responsibilities

  • Collaborate with AI, HW, and SW engineering and research teams to define a vision and roadmap for AI/HPC cluster observability
  • Architect and lead teams to develop, test, and deploy data collectors, pipelines, visualization and retrieval services
  • Define data collection and retention polices to balance network bandwidth, system load, and storage capacity costs with data analysis requirements
  • Provide operational and strategic data to empower engineers and researchers
  • Continuously improve quality, workloads, and processes through better observability

Benefits

  • Multiple relocation packages
  • Two weeklong shutdowns (mid-summer and year-end) in the US (in addition to PTO)
  • 8-week parental leave
  • 9 Employee Resource Groups
  • Annual bonus offering
  • Flexible work arrangements
  • Up to 6% 401K matching
Logo

© 2024 H1BConnect. All rights reserved.

Check out our sister site LatamDev for tech jobs in Latin America! 🌎