<- All Jobs
Tech Lead Manager, Site Reliability Engineer, Product - USDS
The USDS TikTok Product Engineering SRE team works with engineering and product teams to build, maintain and run large-scale, globally distributed, observable, fault-tolerant systems. SREs on this team will deliver on production ownership and be responsible for observability and automation across complex, large-scale service mesh architectures.
In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time.
Responsibilities:
- Provide technical leadership and mentorship to a team of Site Reliability Engineers focused on building observable, fault-tolerant systems
- Drive architectural decisions for large-scale, globally distributed service mesh architectures
- Establish and maintain production ownership models, incident response protocols, and service level objectives
- Develop strategic roadmaps for observability and automation initiatives that enhance system reliability
- Balance technical contributions with people management responsibilities, including career development, performance evaluations, and team growth
- Foster a culture of reliability, continuous improvement, and knowledge sharing within your team and across the organization
- Lead security initiatives to safeguard critical assets, partnering with security and compliance teams to implement robust protocols that ensure data protection and regulatory compliance across all services
In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time.
Responsibilities:
- Provide technical leadership and mentorship to a team of Site Reliability Engineers focused on building observable, fault-tolerant systems
- Drive architectural decisions for large-scale, globally distributed service mesh architectures
- Establish and maintain production ownership models, incident response protocols, and service level objectives
- Develop strategic roadmaps for observability and automation initiatives that enhance system reliability
- Balance technical contributions with people management responsibilities, including career development, performance evaluations, and team growth
- Foster a culture of reliability, continuous improvement, and knowledge sharing within your team and across the organization
- Lead security initiatives to safeguard critical assets, partnering with security and compliance teams to implement robust protocols that ensure data protection and regulatory compliance across all services