<- All Jobs
Site Reliability Engineer (SRE) - USDS
The security team is missioned to run and operate security infrastructures, platforms and technologies, as well as to support cross-functional teams to protect our users, products and infrastructures. In this team you'll have a unique opportunity to have first-hand exposure to the strategy of the company in key security initiatives, especially in deploying and maintaining scalable and secure-by-design systems and solutions. Our challenges are not your regular day-to-day technical problems; you'll be part of a team that's developing new solutions to new challenges of a kind not previously addressed by big tech. It's working fast, at scale, and we're making a difference.
In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time.
Responsibilities
- Work with infrastructure, product and platform engineering team on operating and deploying software platforms, capacity planning and launch reviews throughout whole lifecycle of services.
- Maintain sustainable reliability and scalability of software systems by improving automation to measure and monitoring availability, latency and overall system health.
- Consistently evolve systems by pushing for changes that improve system reliability and release velocity.
- Practice sustainable incident response and postmortems.
In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time.
Responsibilities
- Work with infrastructure, product and platform engineering team on operating and deploying software platforms, capacity planning and launch reviews throughout whole lifecycle of services.
- Maintain sustainable reliability and scalability of software systems by improving automation to measure and monitoring availability, latency and overall system health.
- Consistently evolve systems by pushing for changes that improve system reliability and release velocity.
- Practice sustainable incident response and postmortems.