Senior Software Engineer, AI Inference Systems at NVIDIA | H1B Sponsorship Available

Job Description

We are seeking highly skilled and motivated software engineers to build AI inference systems that serve large-scale models with extreme efficiency, optimizing GPU kernels and collaborating across teams to push the frontier of accelerated computing for AI.

Requirements

Bachelor’s degree in Computer Science, Computer Engineering, or Software Engineering with 7+ years of experience, or Master’s degree with 5+ years of experience, or PhD with thesis and top-tier publications in ML Systems, GPU architecture, or high-performance computing.
Strong programming skills in Python and C/C++; experience with Go or Rust is a plus.
Solid CS fundamentals: algorithms & data structures, operating systems, computer architecture, parallel programming, distributed systems, deep learning theories.
Knowledgeable about performance engineering in ML frameworks and inference engines.
Familiarity with GPU programming and performance: CUDA, memory hierarchy, streams, NCCL.
Proficiency with profiling/debug tools (e.g., Nsight Systems/Compute).
Experience with containers and orchestration (Docker, Kubernetes, Slurm).
Excellent debugging, problem-solving, and communication skills.

Responsibilities

Contribute features to vLLM for the latest NVIDIA GPU hardware features; profile and optimize vLLM inference framework.
Develop, optimize, and benchmark GPU kernels using techniques such as fusion, autotuning, and memory/layout optimization.
Define and build inference benchmarking methodologies and tools; contribute to MLPerf Inference benchmarking suite.
Architect scheduling and orchestration of containerized large-scale inference deployments on GPU clusters across clouds.
Conduct and publish original research that pushes the pareto frontier for ML Systems.