JobsPrincipal Software Engineer
Job description
The Principal Supercomputing Software Engineer will be part of the Microsoft Azure High Performance Computing & AI Engineering team, focusing on managing AI High Performance Computing products. This role involves designing and developing high volume low latency telemetry pipelines to provide insights on customer-facing issues across the infrastructure stack. The engineer will engage with strategic customers and drive engineering improvements within the Azure ecosystem. This position offers hands-on experience with large-scale supercomputers and contributes to innovation in AI and HPC in the cloud.
Requirements
- Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python.
- 5+ years hands-on experience designing and developing high volume low latency pipelines using products such as AzPubSub, Event Hubs, Azure Stream Analytics, Kafka, Grafana, or Prometheus.
- 3+ years of experience with AI/HPC system management, High-Speed Networks, HPC Storage, or managing Cloud Infrastructure.
Responsibilities
- Architect, design and develop high volume low latency end-to-end event pipelines that provide insights on events causing job interrupts and reliability.
- Conduct analysis of existing event pipelines to evaluate fidelity, granularity, and latency of critical events.
- Contribute to improving key metrics such as Job Mean Time to Interrupt and Mean Time to Resolve on flagship supercomputers.
- Partner with cross-organizational teams to evaluate available telemetry and drive the architecture, design, development, and deployment of solutions.
- Drive engineering and operational excellence based on issues and learnings from strategic customers.
- Lead the resolution of complex incidents and champion initiatives to minimize future customer impact.
Benefits
- Employees at Microsoft are often offered comprehensive, “world-class” benefits—including health and mental-wellness programs, competitive pay with bonuses and stock awards, and retirement/savings options. Time-off and flexibility are common, with generous vacation and holidays, parental and caregiver leave, and flexible work schedules, alongside learning support, employee resource groups, product discounts, and matching-gifts/volunteering programs. Specific benefits can vary by region.
Is this posting expired or inaccurate?
