Work in Washington Veterans Jobs

Job Information

Duolingo Staff Site Reliability Engineer in Seattle, Washington

Apply Now (#app)

Staff Site Reliability Engineer

at Duolingo

(View all jobs) (/duolingo)

Seattle, WA

Duolingo is the most popular language learning application in the world, with over 300 million users. We are passionate about education, fact-based decision making, and elegant solutions to cross-functional problems. If that sounds like you, then come join us as we build the next-generation learning company!

As a Staff Site Reliability Engineer, you will work closely with cross-functional engineering teams to ensure Duolingo’s complex distributed systems and products are built and maintained with world-class quality, and operated in measurable and scalable ways.

You will...

  • Collaborate with internal teams to identify sources of instability in distributed systems and drive operational excellence

  • Own core infrastructure (i.e manage, diagnose, and debug large-scale distributed systems in production)

  • Provide system design consulting, develop software platforms/frameworks, and conduct launch reviews and root cause analysis

  • Maintain and document sustainable postmortem/incident response practices

  • Understand and resolve potential threats to performance or security

  • Monitor and measure latency, availability and overall system health, once live

  • Advocate for and implement changes that improve reliability, scalability, and velocity

  • Monitor and stress test systems to collect metrics for tuning and capacity planning

  • Reduce the burden of toil with iterative development of tooling and automation

  • Collaborate with engineering teams to release new features and become an authority on our services

  • Participate in on-call rotation

You have...

  • Bachelor’s Degree in Computer Science

  • 5+ years of experience within site reliability engineering/devops of a product with millions of users

  • Experience analyzing and troubleshooting large-scale distributed systems

  • Proven knowledge of C, C++, Java, Kotlin, Python or Go

  • Fluency in networking protocols, such as TCP/IP, HTTP, SSL, DNS, etc

  • An understanding of containerization toolsets and container orchestration technologies (Docker, Mesos, Kubernetes, Nomad, etc)

  • Effective communication skills and understanding of best practices around tools/methodologies for Infrastructure, Automation, Capacity Planning, etc.

  • Ability to be on-call for critical incident responses