JobsSr. System Development Engineer, AL/ML/Storage server team
Amazon logo

Sr. System Development Engineer, AL/ML/Storage server team

Amazon

Location

Cupertino, CA

Type

Full-time

Posted

5/22/2026

Compensation

$151,200 - $235,200 per year

Undergraduate with 5+ Years of Experience
Approval 98.6%·Filings 19,451·New hires 10,113·
👑 Elite Sponsor
·FY 2025

Job description

The Senior Systems Development Engineer will lead the development of automation software and diagnostic tooling for server platforms, focusing on maintaining fleet health and achieving zero-touch operations. This role involves collaborating with various teams to build scalable and reliable systems that enhance the performance of storage and AI/ML compute fleets. The engineer will tackle complex architectural problems and proactively identify deficiencies in systems. The position requires a blend of software development, systems design, and operational expertise to ensure high-quality server solutions.

Requirements

  • 6+ years of non-internship professional software development experience
  • 6+ years of systems design, software development, operations, automation, and process improvement experience
  • 6+ years of designing or architecting new and existing systems experience
  • 5+ years of programming with at least one modern language such as C++, C#, Java, Python, Golang, PowerShell, or Ruby
  • Experience with Linux/Unix
  • Experience leading the design, build, and deployment of complex software solutions in production
  • Experience with Linux kernel driver development
  • Experience with storage, compute, and GPU/accelerator platforms, including driver integration and diagnostics
  • Familiarity with server hardware architecture and hardware diagnostics
  • Experience working with ODMs or hardware design partners through the product development lifecycle

Responsibilities

  • Build and own the automation infrastructure responsible for the health of the server fleet across storage and AI/ML compute platforms
  • Design and implement predictive failure detection systems using telemetry and log correlation
  • Drive toward zero-touch operations by building automation that detects and remediates faults without human intervention
  • Develop monitoring tools and dashboards to provide real-time visibility into fleet health
  • Debug and resolve complex system-level issues across various environments
  • Build diagnostic tooling that automates root cause identification
  • Lead the definition and development of software and automation tools for server hardware programs
  • Design and build scalable system-level software with a focus on durability and availability
  • Build and manage CI/CD pipelines for rapid deployment of code changes
  • Collaborate with internal teams to ensure new server hardware meets functionality requirements

Benefits

  • Employees at Amazon are often offered comprehensive health benefits—including multiple medical plan options (no pre-existing condition exclusions, 100% covered in-network preventive care), dental and vision plans, a 24/7 medical advice line from day one, expert second-opinion services, and broad mental-health support with several free counseling sessions (including pediatric). Financial wellness typically includes a 401(k) with company match (up to 2%), Restricted Stock Units (equity), FSAs, an emergency savings program, product and partner discounts, and even college-savings and home-purchase programs. Overall, the package is designed to support employees and their families’ health, finances, and day-to-day life.

Is this posting expired or inaccurate?