Staff Solution Engineer

Date: Apr 23, 2024

Location: San Jose, California, United States

Company: Super Micro Computer

Job Req ID: 24098

About Supermicro:

Supermicro® is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop/ Big Data, Hyperscale, HPC and IoT/Embedded customers worldwide. We are the #5 fastest growing company among the Silicon Valley Top 50 technology firms. Our unprecedented global expansion has provided us with the opportunity to offer a large number of new positions to the technology community. We seek talented, passionate, and committed engineers, technologists, and business leaders to join us.
 

Job Summary:

Supermicro is seeking a hands-on Staff Solution Engineer to address the most challenging problems the HPC community is facing. As a solution engineer, you will be leveraging your expert technical knowledge and past implementation experience in tuning HPC & AI/ML solutions leveraging technologies such as Slurm, MPI, OpenCL, OpenMP, GPGPU, CUDA, etc. You will be responsible for the high-quality results demanded by our HPC customers. You will be responsible for the overall performance of the HPC / AIML technologies that includes performance benchmarking, monitoring and troubleshooting production issues. You will be responsible for overall solution architecture, vendor engagements, technical PoCs, initial deployment and integration with all infrastructure and development functions – hence you will need excellent understanding of infrastructure operations as well as tools and patterns used in an agile development, continuous delivery environment.

Essential Duties and Responsibilities:

Includes the following essential duties and responsibilities (other duties may also be assigned):
• Active member of a multi-disciplinary team to develop solutions for large scale training systems.
• Research requirements and adapt HPC & AI/ML solutions to provide a robust system to run and maintain container images across hybrid clouds
• Investigate scheduling methodologies that can support dynamic scaling for single node jobs or make more effective use of backfilling in support of scientific data pipelines run on hybrid clouds.
• Understanding of RDMA congestion control mechanisms on IB and RoCE Networks. Tune / configure High Speed Communication Networks leveraging infiniband and ethernet networks.
• Tune/configure storage sub-systems that is leveraged by HPC and AI/ML systems
• Superior scripting skills and excellent attention to detail; proficiency in at least Python, Perl, or Bash
• Excellent communication and people skills; excellent time management and organizational skills

Qualifications:

• Education and/or Experience: BS in EE, CE, ME preferred
• A minimum of 12 years experience with integration development of HPC systems leveraging technologies such as MPI, Parallel Computing/Processing, Slurm, GPGPU, OpenCL, OpenMP, etc.
• A minimum of 12 years of experience developing / tuning “system software” in heterogeneous, multi-platform HPC environments
• Strong ability to analyze, debug and maintain the integrity of an existing code base
• Demonstrated equivalence of 7 years of Linux/UNIX user support experience and hands-on experience with administration of Linux systems
• Experience working with HPC applications and proficiency in at least C, C++, or Fortran. Proficiency in scripting Python
• Knowledge of AI/ML technologies such as Tensorflow, CUDA, and Kubernetes is a plus
• Experience with Agile development tools (Jira, Git) 
• Confident presenter, and strong influencer; able to adapt level and style to the audience

Salary Range

$147,000 - $164,000 

The salary offered will depend on several factors, including your location, level, education, training, specific skills, years of experience, and comparison to other employees already in this role. In addition to a comprehensive benefits package, candidates may be eligible for other forms of compensation, such as participation in bonus and equity award programs.

EEO Statement

Supermicro is an Equal Opportunity Employer and embraces diversity in our employee population. It is the policy of Supermicro to provide equal opportunity to all qualified applicants and employees without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected veteran status or special disabled veteran, marital status, pregnancy, genetic information, or any other legally protected status.


Job Segment: Cloud, Embedded, Solution Architect, Linux, Unix, Technology