Reliability Engineer

Date: Jun 13, 2025

Location: San Jose, California, United States

Company: Super Micro Computer

Job Req ID: 26899

About Supermicro:

Supermicro® is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop/ Big Data, Hyperscale, HPC and IoT/Embedded customers worldwide. We are the #5 fastest growing company among the Silicon Valley Top 50 technology firms. Our unprecedented global expansion has provided us with the opportunity to offer a large number of new positions to the technology community. We seek talented, passionate, and committed engineers, technologists, and business leaders to join us.
 

Job Summary:

Supermicro Computer is seeking an experienced Reliability Engineer to execute reliability validation for high-performance server platforms, with a specific focus on CPU validation and environmental stress testing. This role is critical in ensuring system-level robustness, long-term stability, and thermal reliability of products.

The engineer will design, execute, and analyze reliability and stress test plans, including thermal cycling, high-temperature operating and power cycling, while also coordinating closely with cross-functional engineering teams. The ideal candidate has strong system-level debug experience, deep hardware knowledge (especially CPU platforms), and a proven track record in reliability validation and root cause analysis.

Essential Duties and Responsibilities:

Includes the following essential duties and responsibilities (other duties may also be assigned):

  • Develop and execute reliability test plans, including thermal, voltage, and long-duration stress testing.
  • Monitor system health (e.g., error logs, temperature sensors) and analyze failures to determine root cause.
  • Conduct CPU validation on a variety of motherboard and system configurations
  • Maintain and calibrate thermal chambers, power cycling equipment, and automated stress platforms to ensure consistent test results.
  • Coordinate closely with platform engineering, BIOS, hardware design, and quality teams to align on test coverage and resolve cross-functional issues.
  • Document and maintain SOPs for test setups, execution, and reporting; ensure compliance with internal and industry test standards.
  • Manage test schedules and resources (e.g., CPU samples, chambers, power equipment) to ensure validation milestones are met.
  • Provide clear and detailed validation reports summarizing methodology, results, and root cause analysis for failures.
  • Drive process improvements in the validation workflow, data tracking, and issue traceability.

Qualifications:

Required:

  • Bachelor’s or Master’s degree in EE, CE, or a related technical field.
  • 5+ years of experience in hardware validation, with a focus on CPU, system reliability, or stress testing.
  • Strong hands-on experience with server hardware (e.g., CPU sockets, heatsinks, VRMs, DIMMs) and system-level validation.
  • Proficient in using thermal chambers, power cycling tools, and monitoring utilities (IPMI, sensors, thermal cameras, etc.).
  • Familiarity with industry-standard reliability methodologies
  • Experience with BIOS configuration, firmware tools, and OS-based stress testing (e.g., Prime95, BurnInTest, LINPACK).
  • Effective communication skills for reporting results and collaborating across teams.

Preferred:

  • Experience with automated test environments and scripting (Python, Bash, or PowerShell).
  • Background in validation of high-end server CPU platforms (Intel, AMD, or ARM-based).
  • Prior experience maintaining or creating reliability SOPs and validation dashboards.

Salary Range

$95,000 - $113,000

The salary offered will depend on several factors, including your location, level, education, training, specific skills, years of experience, and comparison to other employees already in this role. In addition to a comprehensive benefits package, candidates may be eligible for other forms of compensation, such as participation in bonus and equity award programs.

EEO Statement

Supermicro is an Equal Opportunity Employer and embraces diversity in our employee population. It is the policy of Supermicro to provide equal opportunity to all qualified applicants and employees without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected veteran status or special disabled veteran, marital status, pregnancy, genetic information, or any other legally protected status.


Job Segment: Thermal Engineering, Cloud, Firmware, Engineer, Data Center, Engineering, Technology