Work in Washington Veterans Jobs

Job Information

Microsoft Corporation Senior Software Engineer in Redmond, Washington

Microsoft is a company where passionate innovators come to collaborate, envision what can be and take their careers further. This is a world of more possibilities, more innovation, more openness, and the sky is the limit thinking in a cloud-enabled world.

The AI Platform organization at Microsoft builds the end-to-end Azure AI stack/PaaS (Platform as a Service) and is core to Azure’s innovation and differentiation, as well as the AI-related capabilities of all of Microsoft’s flagship products, from Office to Teams, to Xbox. We are the team building Azure OpenAI, Azure ML, AI Studio, Cognitive Services, and the global Azure AI infrastructure for running the largest AI workloads on the planet.

We do not just value differences or different perspectives. We seek them out and invite them so we can tap into the collective power of everyone in the company. As a result, our customers are better served.

Within AI Platform, the Azure ML Services team enables data scientists and developers to quickly and easily build, train, deploy, manage, and consume machine learning models.

As part of Azure ML, the AI Infra team is looking for a Senior Software Engineer , with initial focus on the Scheduler subsystem. The scheduler is the “brains” of the AI Infra control plane. It governs access to the GPU and NPU capacity of the platform according to a complex system of placement constraints, preference rules, and dynamically interacting policies aimed to maximize hardware utilization and fulfill greatly varying needs of users and the AI Platform partner services in terms of workload types, prioritization, and capacity targeting flexibility. The scheduler manages quota, capacity reservations, SLA tiers, preemption, auto-scaling, and a wide range of configurable policies. Global scheduling is a distinctive feature that overcomes the regional segmentation of the Azure compute fleet by treating the GPU capacity as a single global virtual pool, which greatly increases capacity availability and utilization for major classes of ML workload. Our system can manage GPU capacity even outside the Azure datacenters.

To be able to manage the complexity of all scheduling policies and placement constraints and meet the expectations of high service reliability, availability, and throughput, we emphasize rigorous engineering, utmost precision and quality, and ownership—from feature design to livesite. Quality mindset, unit-testing proficiency, attention to detail, development process rigor are key for success in our mission-critical control plane space.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.

By applying to this U.S. based position, while remote work is possible, relocation does not apply/is not provided for the role.

Responsibilities

  • Work on the architecture, design, and development of the core AI Infrastructure services that support large scale AI training and inferencing.

  • Develop, test, and maintaincontrol plane and scheduler services written in C#, hosted on Kubernetes or Service Fabric clusters and Docker containers.

  • Enhance systems and applications to ensure high stability, efficiency, & maintainability, low latency, tight cloud security.

  • Provide operational support and DRI (on-call) responsibilities for the product.

  • Develop and foster a deep understanding of the machine learning systems and concepts and their usage by our customers.

  • Collaborate closely with engineers, data scientists within the team, internal Microsoft Research teams and external enterprises to build better solutions together.

  • Provide vision, expertise, and technical leadership to other team members. 

  • Help to grow talent in these areas.

  • Embody our culture (https://careers.microsoft.com/v2/global/en/culture) and values (https://www.microsoft.com/en-us/about/corporate-values)

Qualifications

Required /Minimum Qualifications

  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python

  • OR equivalent experience.

Other Requirements

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check:

  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Preferred/Additional Qualifications

  • Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python

  • OR Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python

  • OR equivalent experience.

  • 3+ years of experience with large scale control plane services and distributed systems, including concurrency management, multi-threaded systems, persistent state management.

  • OOP proficiency and familiarity with common code design patterns are essential.

  • Knowledge and first-hand experience with building large-scale global services with high SLA,

  • Experience in distributed systems and architecture.

  • Experience working in a geo-distributed team.

  • AI infrastructure and workload knowledge is a plus. 

Software Engineering IC4 - The typical base pay range for this role across the U.S. is USD $117,200 - $229,200 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $153,600 - $250,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

Microsoft will accept applications for the role until September 30, 2024.

Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations (https://careers.microsoft.com/v2/global/en/accessibility.html) .

DirectEmployers