JobsAsset & Wealth Management - Site Reliability Engineer - Vice President - Richardson

Asset & Wealth Management - Site Reliability Engineer - Vice President - Richardson

Goldman Sachs

Asset & Wealth Management - Site Reliability Engineer - Vice President - Richardson

Goldman Sachs

Location

Richardson, TX

Type

Full-time

Posted

5/21/2026

Compensation

Not listed

Master's with 5+ Years of Experience

Approval 98.3%·Filings 1,746·New hires 747·

💎 Strong Sponsor

·FY 2025

Job description

The Site Reliability Engineer (SRE) role at Goldman Sachs focuses on ensuring the availability, reliability, and scalability of critical platform applications and services. The SRE team combines software and systems engineering to build and maintain robust, fault-tolerant systems across on-premises and cloud environments. Engineers in this role will lead technical projects, mentor other engineers, and drive the adoption of advanced SRE principles. The position emphasizes collaboration with internal teams and stakeholders to enhance operational efficiency and system resilience.

Requirements

Minimum of 6+ years of hands-on experience in Site Reliability Engineering.
Exceptional programming skills in one or more major languages such as Java, Python, or Go.
Extensive hands-on experience with cloud platforms like AWS or GCP.
Deep expertise in containerization and orchestration technologies such as Docker and Kubernetes.
Mastery of Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
Advanced proficiency in Prompt Engineering and Retrieval-Augmented Generation (RAG) architectures.
Profound understanding of Linux internals, networking, and distributed systems.
Expertise in designing and implementing monitoring, alerting, logging, and tracing solutions.
Deep experience with CI/CD tools and practices such as Jenkins or GitLab.
Strong foundation in databases and distributed systems.

Responsibilities

Drive the strategic direction for availability, scalability, and performance of mission-critical applications.
Lead the design, build, and implementation of highly available and resilient infrastructure.
Architect and develop platforms, tools, and automation solutions to optimize operational workflows.
Lead critical incident response and conduct in-depth root cause analysis for systemic issues.
Partner with development teams to embed reliability into application design from inception.
Define and implement advanced monitoring and logging strategies for actionable insights.
Provide technical vision and mentorship to senior and staff-level engineers.
Evaluate and integrate cutting-edge tools and frameworks to improve operational efficiency.
Participate in and lead on-call rotations for critical system incidents.

Benefits

Employees at Goldman Sachs are often offered comprehensive benefits, including medical, dental, life and disability coverage, generous vacation and holidays, and robust wellness resources such as EAP counseling, medical advocacy, on-site/virtual health services, and fitness support. Financial perks typically include retirement savings programs, live financial education, education support, and wealth-creation opportunities through equity awards and select investment programs. Many locations also provide family benefits (childcare resources, parental and family leaves, adoption/surrogacy support) and flexible work options like part-time schedules, job sharing, telecommuting, and alternate hours.

Is this posting expired or inaccurate?