JobsPrincipal Site Reliability Engineer

Principal Site Reliability Engineer

Microsoft

Principal Site Reliability Engineer

Microsoft

Location

USA

Type

Full-time

Posted

5/4/2026

Compensation

$139,900 - $304,200 per year

Undergraduate with 5+ Years of Experience

Master's with 5+ Years of Experience

PhD with 5+ Years of Experience

Approval 98.4%·Filings 6,363·New hires 3,142·

👑 Elite Sponsor

·FY 2025

Job description

The Principal Site Reliability Engineer will lead critical initiatives within the team responsible for managing high severity incidents across Microsoft M365 Substrate Core services. This role focuses on ensuring consistent and effective incident handling while minimizing customer impact and promoting organizational learning. The engineer will collaborate closely with Incident Managers, Service Owners, and executive stakeholders to enhance incident response practices. Additionally, the role involves coaching and developing a team of Site Reliability Engineers to foster a culture of accountability and continuous improvement.

Requirements

Doctorate Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience OR equivalent experience.
Ability to meet Microsoft, customer and/or government security screening requirements.
7+ years technical experience working with large-scale cloud or distributed systems.
Experience building or scaling incident response programs at organizational or enterprise scope.
Proven experience leading teams through high severity production incidents in large, distributed systems.

Responsibilities

Own execution quality for Substrate high severity incidents, ensuring clear command and decisive leadership.
Act as the senior incident leader or sponsor for long running, high stakes, or cross service incidents.
Partner closely with Incident Managers, Subject Matter Experts, and service leaders to ensure effective diagnosis and escalation.
Ensure high quality post incident reviews and drive accountability for repair items.
Coach and help develop a team of Site Reliability Engineers serving as incident responders.
Build a culture of calm execution, accountability, psychological safety, and continuous learning during and after incidents.
Serve as a trusted advisor to engineering leaders and executives on live site risk and incident response maturity.

Benefits

Employees at Microsoft are often offered comprehensive, “world-class” benefits—including health and mental-wellness programs, competitive pay with bonuses and stock awards, and retirement/savings options. Time-off and flexibility are common, with generous vacation and holidays, parental and caregiver leave, and flexible work schedules, alongside learning support, employee resource groups, product discounts, and matching-gifts/volunteering programs. Specific benefits can vary by region.

Is this posting expired or inaccurate?