Microsoft Corporation CO+I Incident Manager in Redmond, Washington
Microsoft is on a mission to empower every person and every organization on the planet to achieve more. Our culture is centered on embracing a growth mindset, a theme of inspiring excellence, and encouraging teams and leaders to bring their best each day. In doing so, we create life-changing innovations that impact billions of lives around the world. You can help us achieve our mission.
Cloud Operations + Innovation (CO+I) is the engine that powers Microsoft’s core cloud platforms and services that millions of people use every day. With more than 95% of Fortune 500 business on Azure, 180 million using Office 365, and millions using other services – all running on Microsoft's cloud infrastructure – CO+I builds and operates the foundation upon which Microsoft’s mission to empower every person and organization comes to life.
As the CO+I Lead Incident Manager, you are central to our efforts to ensure our customers have the best possible service experience. In this role, you will orchestrate building a “best-in-class” global incident management program in close partnership with the Incident Management team, and the team’s respective stakeholders. The primary goal of the program is to minimize downtime on our customers. You will achieve this through the delivery of improvements that speed up the time to mitigate events while using postmortems to drive future incident prevention. You will reduce operational inefficiencies in the incident management process to clear the fastest path to the directly responsible individual (DRI) through automation and continuous improvement.
You are also responsible for serving as an escalation point to triage complex issues ranging from physical Data Center issues to server and network failures. You will drive the efforts of multiple Microsoft teams to bring about safe and rapid mitigation of incidents that impact customers on a global scale managing the communication of updates vertically and horizontally as blast radius is understood. This requires composure under pressure, broad technical, analytical, and problem-solving expertise, ability to confidently collaborate with varied partners, and great written and spoken communication. Your work will have you interacting across the globe with Microsoft Engineers from all disciplines including Electrical, Mechanical, Network, Software and Hardware.
In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.
“Customer Obsessed” – Have a mindset that is focused first on the customer and how to use technology to make their experience safe, reliable, performant, scalable, etc.
Identify opportunities and take ownership for automation and/or continuous improvement of the Incident Management process and best practices
Provide feedback and drive improvements with current tools and processes, driving initiatives to the appropriate group, for proactive design changes and implementation or business risk assessment for incident causal factors.
Build out and manage the Incident Management program backlog with mechanisms to distribute and track work amongst the team members and partner teams
Lead the Incident Review forums (i.e. - to ensure ownership is closed loop and delivers the outcomes expected to prevent future incidents
Develop mechanisms across Incident Management functions (Azure) to share and adopt best practices that lead to better execution during incidents
Improve KPIs for measuring Incident Management phases, removing manual data entry points
Proactively communicate with Microsoft executive leadership, managers, engineering groups, and key stakeholders on active major incidents or crises.
Perform incident triage when escalated, to include determining scope, urgency, and potential impact, identifying the specific vulnerability, and making recommendations that enables swift remediation.
Drive deep-dive post incident analysis of customer impacting issues with the Senior Incident Managers and respective teams focusing on reducing the likelihood of future events.
Participate in recovery implementation & testing exercises using scenario-based use cases to drive potential impact awareness and remediation.
Capture and record all incident timelines, data, and restoration efforts for handoff to the Problem Management and Forensic Engineering teams.
Identify, explore, and then ultimately drive cross team efforts to proactively resolve issues that could cause impact to our customers
3-5 years of experience in data center operations
2 years of experience with incident, outage, or crisis management
2 years of experience managing programs and cross-team projects
Must be able to participate in an on-call rotation and serve as escalation point
BS/BA in Electrical or Mechanical engineering, Computer Science, telecommunications, or equivalent education or five (5) years equivalent work experience.
Ability to maintain calm during stressful situations; demonstrated leadership skills under fast-paced, highly dynamic situations
Strong collaboration skills: working across teams and organizations is necessary to be successful.
Excellent written and verbal communication skills.
Working knowledge and understanding of data center systems such as Power, Cooling, and networking is critical.
Demonstrated ability to balance strategic and tactical initiatives using sound business judgment.
Demonstrated quantitative skills to resolve ambiguous problems and driving to root cause.
Direct experience with business continuity.
Demonstrated ability to set priorities, pursue multiple threads at the same time, accurately reflect the current state and drive towards desired state.
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check every two years thereafter.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form (https://careers.microsoft.com/us/en/accommodationrequest) .
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
- Microsoft Corporation Jobs