Twitter Sr. Site Reliability Engineer- Blobstore in Seattle, Washington
The Blobstore team stores and serves petabytes of blob data, including the media uploaded by our users. This is mission critical for Twitter's success, and an opportunity to directly make a positive impact on the experience of every Twitter user. As a Site Reliability Engineer embedded on the Blobstore team, you’ll bring the SRE discipline and perspective to the priorities and challenges we face.
What you’ll be doing:
Help us rearchitect current Blobstore system to be able to provide better utilization of the modern hardware based on SSDs for on-prem storage.
Build tooling to improve the automation of operations, and reduction of toil. This includes automatic failure remediation, application and systems deployment, capacity planning, and fleet management.
Troubleshoot complex distributed systems handling millions of queries per second, petabytes of data.
Bring the SRE mindset for Availability, Reliability, Scalability, Disaster Recovery, Problem/Incident Management, and Performance of production services.
Help bring our service to more data centers and cloud environments faster with reliable automation, Docker + Kubernetes, and other ideas you’ve got!
Identify and contribute to solutions for reducing services outages, reducing alert noise, improving monitoring, and helping our services reach Service Level Objectives (SLOs).
Participate in the team’s Agile practices and On Call rotation.
Work with highly distributed and diverse hardware, software, and networking teams throughout the company.
5+ years of developing or managing services in a distributed, internet-scale, production environment.
Practical knowledge of at least one programming language (Python, Go, Java, Ruby, C++, Scala).
Demonstrable knowledge of Linux operating system internals, TCP/IP, filesystems, disk/storage technologies.
Experience with state configuration tools (Puppet, Chef, etc.).
Experience setting up capacity plans for physical and/or virtual infrastructure.
Ability to prioritize tasks and work independently. A self-starter.
Good written and oral skills, to help create clarity when working across multiple services and stakeholders.
Bonus: Hands on experience with Blob and/or Block storage systems, including Cloud-based (S3, Azure Blob, etc) or proprietary systems.
All of your information will be kept confidential according to EEO guidelines.