<- All Jobs
Hardware Reliability Engineer, Google Cloud
Minimum qualifications:
- Bachelor’s degree Material Science, Computer Science, Electrical Engineering, Mechanical Engineering, Statistics or engineering related fields.
- 5 years of experience in a reliability engineering and failure analysis related role in the electronics industry or a technical field, or equivalent practical experience.
Preferred qualifications:
- Experience in failure analysis or knowledge of ongoing reliability test planning and execution.
- Experience in Accelerated Life Testing (ALT), Reliability Block Diagram, DFMEA, load-stress analysis, electronics physics of failure Design of Experiments.
- Experiences in product life transportation, storage, operating profile (temperature, vibration, power cycling, humidity, shock, etc.).
- Experience with statistical analysis, Bayesian methods, degradation analysis, capable of using SQL or BigQuery.
About the job
As a hardware Reliability Engineer in CSCO, you evaluate the product design, and create the reliability related processes, tools and testing procedures to help continuously improve every aspect of product reliability. You will influence and work with various engineering teams, product operations, commodity managers, supply chain teams, and the suppliers to ensure the product represents the Google's brand and ensures a reliability of service beyond today’s standards. You use data to drive your decisions, and analyze the data to drive actions that will improve availability and fleet reliability. You establish root causes for reliability issues. You drive systemic corrective actions with demonstrated solutions. You institutionalize these reliability improvements throughout the product life cycle and across the supply chain.Behind everything our users see online is the architecture built by the Technical Infrastructure team to keep it running. From developing and maintaining our data centers to building the next generation of Google platforms, we make Google's product portfolio possible. We're proud to be our engineers' engineers and love voiding warranties by taking things apart so we can rebuild them. We keep our networks up and running, ensuring our users have the best and fastest experience possible.
Responsibilities
- Drive reliability engineering efforts for Machine Learning/GPU accelerators, collaborate with the microelectronics simulation team to enable early-phase board-level reliability assessments and timely effective failure analysis for Machine Learning (ML) products.
- Drive adaptive Ongoing Reliability Testing (ORT) at manufacturing sites, translating downstream insights into test strategies, resource allocation, and defect resolution.
- Lead the reliability qualification efforts for NPI product transfers across regions and second-source site qualifications at agreement manufacturers to expedite partners’ reliability capability ramp-up and readiness.
- Serve as the reliability representative in the critical component new technology introduction process, collaborate with design engineers and supplier quality engineers to ensure stringent reliability requirements are integrated into the safe launch process.