T-Mobile USA, Inc Principal Engineers, Systems Reliability in Kennard Corner, Washington

Career Band: L07 Be unstoppable with us! T-Mobile is synonymous with innovation-and you could be part of the team that disrupted an entire industry! We reinvented customer service, brought real 5G to the nation, and now we're shaping the future of technology in wireless and beyond. Our work is as exciting as it is rewarding, so consider the career opportunity below as your invitation to grow with us, make big things happen with us, above all, #BEYOU with us. Together, we won't stop! Position summary T-Mobile is America's supercharged Un-carrier, delivering an advanced and transformative nationwide 5G network that will offer reliable connectivity for all. Principal Engineers, Systems Reliability located in Bothell, Washington will improve and protect the software and systems behind all of T-Mobile's IT services, including management of scalability, availability, latency, performance, security, and capacity. Position duties and responsibilities include, but are not limited to: Design and maintain CICD Pipelines to build the next generation of T-Mobile applications on cloud native platforms. Assist in creating new designs, architectures, standards, repeatable processes and methods for delivering software faster, better, and cheaper, and managing operations better, resulting in increased customer experience by continuous improvement of the operations of the applications. Demonstrate fluency in emerging DevOps-centric automation tools and technologies for CICD, configuration management, etc. for non-prod environments. Perform environment management and automated server provisioning (VMs). Deliver software to improve the availability, scalability, latency, and efficiency of T-Mobile's services. Create, manage, and use dashboard for continuous monitoring and health check of applications, and the underlying infrastructure. Contribute to future improvement of software delivery processes and operations, e.g., cloud enablement. Lead and instruct a team of Software Reliability Engineers. Manage improvement work or POCs as projects. Utilize experience in Continuous Integration/Continuous Delivery tools such as Jenkins, Cloudbees, and other automation tools. Utilize experience with DevOps tools such as Ansible, Chef, and Puppet. Utilize experience in APM tools like AppDynamics and logging tools like Splunk. Work in a cloud environment (public/private). Conduct end to end incident management: reduce the MTTD, MTTE and MTTR. Improve communication to stakeholders on impacts, mitigation, and corrective action. Conduct blameless postmortem and provide feedback to developers and architects. Provide oversight on release engineering processes and contribute to evolution of self-service model for scrum teams. Evaluate new technologies and infra enhancements required by T-Mobile enterprise (Infra foundational aspects). Partner with scrum teams on feature definition. Research end to end tracing and observability and ensure adherence to standards and maturity around observability to eliminate gaps during feature delivery. Lead and support quarterly SOX/Audit controls processes and provide proof/supporting documentation for attestation. Telecommuting is permitted, but applicants must live within a reasonable commuting distance. Skill requirements: (1) Leveraging knowledge of DevOps (CI / CT / CD) and applying CI/CT/CD in an Agile environment to create and modify CI/CD pipeline leveraged by scrum teams for software development and releasing code to applicable environments. (2) Utilizing platform technologies and components including security, performance, optimization, and API integration to continuously optimize the application. (3) Conducting end to end incident management including reducing the MTTD, MTTE and MTTR, improving communication to