JobsSenior AI Infrastructure Software Engineer - DGX Cloud
NVIDIA logo

Senior AI Infrastructure Software Engineer - DGX Cloud

NVIDIA

Location

USA (Multiple Locations)

Type

Full-time

Posted

5/13/2026

Compensation

$184,000 - $356,500 per year

Undergraduate with 5+ Years of Experience
Approval 99.2%·Filings 1,781·New hires 873·
👑 Elite Sponsor
·FY 2025

Job description

As a senior AI infrastructure software engineer at NVIDIA, you will join the DGX Cloud Lepton Team, contributing to the development of a leading AI/ML platform that enhances productivity and optimizes AI workloads. The role involves designing, building, and maintaining AI platforms for large-scale training and inferencing. You will work in a dynamic environment that values learning, growth, and innovation. This position offers the opportunity to impact the future of AI while collaborating with a supportive team.

Requirements

  • Minimum of 8+ years of experience in developing software infrastructure for large scale AI systems.
  • Bachelor's degree or higher in Computer Science or a related technical field.
  • Strong debugging skills and experience in analyzing and triaging AI applications from the application level to the hardware level.
  • Proven track record in building and scaling large-scale distributed systems.
  • Experience with AI training and inferencing and data infrastructure services.
  • Familiarity with Kubernetes and operating large-scale observability platforms for monitoring and logging.
  • Proficiency in programming languages such as Python, C/C++, and scripting languages.
  • Excellent communication and collaboration skills.

Responsibilities

  • Develop platform and tools for large-scale AI, LLM, and GenAI infrastructure.
  • Develop and optimize tools to improve AI/ML workload efficiency and resiliency.
  • Root cause, analyze, and triage failures from the application level to the hardware level.
  • Enhance infrastructure and products underpinning NVIDIA's AI platforms.
  • Co-design and implement APIs for integration with NVIDIA's resiliency stacks on the platform.
  • Define meaningful and actionable reliability metrics to track and improve system and service reliability.

Benefits

  • Employees at NVIDIA are often offered comprehensive, day-one benefits—including medical, dental, and vision coverage with HSA support, life and disability insurance, an Employee Assistance Program, and a 401(k) with auto-enrollment. Many roles also have generous time off and holidays, donation matching (up to $10,000), and a wide menu of extras like FSAs, commuter benefits, legal and identity-theft protection, pet insurance, and wellness discounts. Optional programs can include student-loan and home-purchase support, plus family care resources and expert medical services.

Is this posting expired or inaccurate?