JobsSite Reliability Engineer - CTJ - Secret
Microsoft logo

Site Reliability Engineer - CTJ - Secret

Microsoft

Location

United States

Type

Full-time

Posted

6/9/2026

Compensation

$102,100 - $219,200 per year

Undergraduate with 2+ Years of Experience
Approval 98.4%·Filings 6,363·New hires 3,142·
👑 Elite Sponsor
·FY 2025

Job description

The Site Reliability Engineer II role at Microsoft Substrate focuses on ensuring the reliability and operational health of critical cloud services. The engineer will work independently to diagnose and resolve production issues while designing automation to enhance service stability. This position requires strong technical judgment and collaboration with software engineering teams to embed reliability into service design. The role is essential for maintaining high availability and security in highly regulated environments.

Requirements

  • Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience.
  • 4+ years technical experience in software engineering, network engineering, or systems administration.
  • Ability to meet Microsoft, customer and/or government security screening requirements.

Responsibilities

  • Own reliability and operational health for one or more Substrate components or services in highly regulated environments.
  • Serve as an actively engaged on-call engineer, participating in an on-call rotation and independently responding to incidents for owned services.
  • Respond to, diagnose, and resolve production incidents with minimal supervision.
  • Design and implement automation to reduce operational toil and improve service stability.
  • Develop and maintain monitoring, alerting, and telemetry to support SLOs and operational metrics.
  • Lead post-incident reviews for owned incidents, focusing on root cause analysis and durable fixes.
  • Collaborate with software engineering teams to embed reliability and operability into service design.
  • Write and maintain production-quality code and automation that improves reliability, scalability, and operational efficiency.

Benefits

  • Employees at Microsoft are often offered comprehensive, “world-class” benefits—including health and mental-wellness programs, competitive pay with bonuses and stock awards, and retirement/savings options. Time-off and flexibility are common, with generous vacation and holidays, parental and caregiver leave, and flexible work schedules, alongside learning support, employee resource groups, product discounts, and matching-gifts/volunteering programs. Specific benefits can vary by region.

Is this posting expired or inaccurate?