JobsStaff HPC Engineer
KLA logo

Staff HPC Engineer

KLA

Location

Milpitas, CA

Type

Full-time

Posted

5/28/2026

Compensation

$162,700 - $284,700 per year

PhD with 5+ Years of Experience
Approval 97.8%·Filings 803·New hires 321·
💎 Strong Sponsor
·FY 2025

Job description

The Staff HPC Engineer at KLA is responsible for designing, building, optimizing, and supporting large-scale compute environments for scientific computing and AI/ML workloads. This role requires a blend of systems engineering, performance tuning, and hands-on troubleshooting. The engineer will work closely with researchers, developers, and IT teams to ensure reliable and high-performance compute infrastructure. The position is part of the Information Technology group, which focuses on enhancing technology to empower employees and drive business growth.

Requirements

  • Extensive experience with Linux systems engineering in large-scale compute environments.
  • Solid understanding of distributed systems and cloud infrastructure.
  • Deep knowledge of HPC schedulers, MPI stacks, and parallel computing models.
  • Strong understanding of high-speed interconnects and distributed storage systems.
  • Proficiency in scripting languages and automation frameworks.
  • Experience with GPUs and accelerator-based computing.
  • Familiarity with containerization in HPC contexts.
  • Strong troubleshooting skills across hardware, OS, and application layers.
  • Understanding of networking fundamentals.
  • Background in high-availability and distributed systems at scale.
  • Doctorate degree with 8 years of related work experience, or a Master's degree with 12 years, or a Bachelor's degree with 15 years.

Responsibilities

  • Design and implement HPC clusters, including compute, storage, networking, and job-scheduling components.
  • Evaluate and integrate new technologies such as GPUs and accelerators.
  • Develop automation for cluster provisioning, configuration, and lifecycle management.
  • Architect solutions for large-scale parallel workloads and data-intensive applications.
  • Profile and tune applications for performance across various metrics.
  • Optimize parallel programming frameworks like MPI and OpenMP.
  • Benchmark hardware and software stacks to guide procurement decisions.
  • Maintain and monitor HPC clusters and job schedulers.
  • Troubleshoot complex system issues across compute, storage, and network layers.
  • Implement security best practices and ensure high availability.
  • Build and maintain CI/CD pipelines for HPC-related software.
  • Develop monitoring and observability solutions.
  • Provide technical leadership and mentorship to junior engineers.
  • Document architectures, procedures, and best practices.
  • Participate in capacity planning and long-term HPC strategy.

Benefits

  • Employees at KLA are often offered competitive pay with bonuses, a 401(k) match, an employee stock purchase program, and financial perks like student-debt assistance, planning support, and group insurance discounts. Health and lifestyle benefits typically include medical/dental/vision, life and other voluntary coverages, paid time off and holidays, family leave, backup care, wellness rewards, gym discounts, and community-volunteering opportunities. Employees also get strong growth support through tuition reimbursement, KLA’s corporate learning center, education awards, and engineering certification programs.

Is this posting expired or inaccurate?