JobsSr. System Development Engineer, Edge & High Performance Accelerator Servers for AI/ML
Amazon logo

Sr. System Development Engineer, Edge & High Performance Accelerator Servers for AI/ML

Amazon

Location

Austin, TX

Type

Full-time

Posted

6/12/2026

Compensation

$151,200 - $235,200 per year

Undergraduate with 5+ Years of Experience
Approval 98.6%·Filings 19,451·New hires 10,113·
👑 Elite Sponsor
·FY 2025

Job description

The Senior Systems Development Engineer will lead the development of automation software and diagnostic tooling for server platforms, focusing on maintaining the health of edge and AI/ML compute fleets. This role involves collaborating with various teams to build reliable systems that aim for zero-touch operations. The engineer will tackle complex architectural problems and proactively identify deficiencies in systems. The position requires a blend of software development, systems design, and operational knowledge to enhance server reliability and performance.

Requirements

  • 6+ years of non-internship professional software development experience
  • 6+ years of systems design, software development, operations, automation, and process improvement experience
  • 6+ years of designing or architecting new and existing systems experience
  • 5+ years of programming with at least one modern language such as C++, C#, Java, Python, Golang, PowerShell, or Ruby experience
  • Experience with Linux/Unix
  • Experience leading the design, build, and deployment of complex software solutions in production
  • Experience building predictive failure detection or proactive remediation systems at fleet scale
  • Experience with Linux kernel driver development
  • Familiarity with server hardware architecture, BMC/IPMI, firmware, PCIe topology, and hardware diagnostics
  • Experience working with ODMs or hardware design partners through the product development lifecycle

Responsibilities

  • Build and own the automation infrastructure responsible for the health of the server fleet across edge and accelerator compute platforms
  • Design and implement predictive failure detection systems using telemetry and sensor data
  • Drive toward zero-touch operations by building automation that detects and resolves hardware and software faults
  • Develop monitoring tools and dashboards to provide real-time visibility into fleet health
  • Debug and resolve complex system-level issues across storage, compute, GPU, and networking in production environments
  • Build diagnostic tooling that automates root cause identification and reduces reliance on manual triage
  • Lead the definition and development of software and automation tools for server hardware programs
  • Design and build scalable system-level software with a focus on durability and availability
  • Build, manage, and deploy CI/CD pipelines for rapid deployment of code changes
  • Engage with ODMs and design partners on testability and automation requirements during hardware design

Benefits

  • Employees at Amazon are often offered comprehensive health benefits—including multiple medical plan options (no pre-existing condition exclusions, 100% covered in-network preventive care), dental and vision plans, a 24/7 medical advice line from day one, expert second-opinion services, and broad mental-health support with several free counseling sessions (including pediatric). Financial wellness typically includes a 401(k) with company match (up to 2%), Restricted Stock Units (equity), FSAs, an emergency savings program, product and partner discounts, and even college-savings and home-purchase programs. Overall, the package is designed to support employees and their families’ health, finances, and day-to-day life.

Is this posting expired or inaccurate?