H1BConnect Pro is launching with premium alerts and access to more job postings.Get early access
NVIDIA logo

Senior Software Engineer, AI Inference Systems

NVIDIA
Santa Clara, CA Full-time 11/11/2025 $184,000 - $287,500 a year
Undergraduate with 5+ Years of Experience

Job Description

We are seeking highly skilled and motivated software engineers to build AI inference systems that serve large-scale models with extreme efficiency, optimizing GPU kernels and collaborating across teams to push the frontier of accelerated computing for AI.

Requirements

  • Bachelor’s degree in Computer Science, Computer Engineering, or Software Engineering with 7+ years of experience, or Master’s degree with 5+ years of experience, or PhD with thesis and top-tier publications in ML Systems, GPU architecture, or high-performance computing.
  • Strong programming skills in Python and C/C++; experience with Go or Rust is a plus.
  • Solid CS fundamentals: algorithms & data structures, operating systems, computer architecture, parallel programming, distributed systems, deep learning theories.
  • Knowledgeable about performance engineering in ML frameworks and inference engines.
  • Familiarity with GPU programming and performance: CUDA, memory hierarchy, streams, NCCL.
  • Proficiency with profiling/debug tools (e.g., Nsight Systems/Compute).
  • Experience with containers and orchestration (Docker, Kubernetes, Slurm).
  • Excellent debugging, problem-solving, and communication skills.

Responsibilities

  • Contribute features to vLLM for the latest NVIDIA GPU hardware features; profile and optimize vLLM inference framework.
  • Develop, optimize, and benchmark GPU kernels using techniques such as fusion, autotuning, and memory/layout optimization.
  • Define and build inference benchmarking methodologies and tools; contribute to MLPerf Inference benchmarking suite.
  • Architect scheduling and orchestration of containerized large-scale inference deployments on GPU clusters across clouds.
  • Conduct and publish original research that pushes the pareto frontier for ML Systems.