Create Email Alert

Email Alert for

ⓘ There was an unexpected error processing your request.

Please refresh the page and try again.

If the problem persists, please contact us with your issue.

Email address is already registered

You can always manage your preferences and update your interests to ensure you receive the most relevant opportunities.

Success! You're now signed up for Job Alerts

Get ready to discover your next great opportunity.

Similar Jobs

Cadence Design Systems, Inc.

IT InfiniBand/GPU -Sr Staff Systems Engineer

San Jose, CA, United States
- Ending Soon
At Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology. Cadence is looking for a Sr Staff Systems Engineer who accelerates strategic customer deployments and ensures on-time bring-up and deployment of HPC infrastructure and troubleshooting and supports technical roles supporting HPC, InfiniBand
Job Source: Cadence Design Systems, Inc.
NVIDIA

Senior DevOps and Automation Engineer - HPC

Santa Clara, CA, United States
- Ending Soon
NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers
Job Source: NVIDIA
NVIDIA

Senior DevOps and Automation Engineer

Santa Clara, CA, United States
- Ending Soon
NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers
Job Source: NVIDIA
NVIDIA

Platform and DevOps Engineer - Cluster Operations

Santa Clara, CA, United States
- Ending Soon
NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers
Job Source: NVIDIA
ShiftCode Analytics

Senior Network engineer -InfiniBand

Santa Clara, CA, United States
Interview : Video Visa : All apart from h1b and cpt This is hybrid from day-1 ( local candidates are preferred ) Description : What we need to see: BS or MS in Computer Science, Electrical Engineering, Computer Engineering, or related field (or equivalent experience). At least 8+ years system software development and troubleshooting experienc
Job Source: ShiftCode Analytics
ShiftCode Analytics

Senior Network engineer -InfiniBand

Santa Clara, CA, United States
- Ending Soon
Interview : Video Visa : All apart from h1b and cpt This is hybrid from day-1 ( local candidates are preferred ) Description : What we need to see: BS or MS in Computer Science, Electrical Engineering, Computer Engineering, or related field (or equivalent experience). At least 8+ years system software development and troubleshooting experienc
Job Source: ShiftCode Analytics
NVIDIA

Senior HPC Technical Account Manager

Santa Clara, CA, United States
- Ending Soon
We are seeking a motivated Senior HPC Technical Account Manager, passionate about data center and networking technologies, to provide comprehensive solutions for sophisticated installations, maintenance, or operations for a broad scope of groundbreaking networking products and will provide a premium customer experience to some of our largest custom
Job Source: NVIDIA
NVIDIA

Senior HPC Technical Account Manager

Santa Clara, CA, United States
- Ending Soon
We are seeking a motivated Senior HPC Technical Account Manager, passionate about data center and networking technologies, to provide comprehensive solutions for sophisticated installations, maintenance, or operations for a broad scope of groundbreaking networking products and will provide a premium customer experience to some of our largest custom
Job Source: NVIDIA

IT InfiniBand/GPU -Sr Staff Systems Engineer

San Jose, CA, United States

At Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology.

Cadence is looking for a Sr Staff Systems Engineer who accelerates strategic customer deployments and ensures on-time bring-up and deployment of HPC infrastructure and troubleshooting and supports technical roles supporting HPC, InfiniBand, and GPU at our San Jose location!

The successful candidate will be a hands-on technical candidate within the infrastructure team and be exposed to customer interfaces dealing with the Windows and Linux OS.

The System Engineer will need experience in Linux environments and proficiency in tasks such as shell scripting.

Role: IT -Sr Staff Systems Engineer

Location on-site (not remote): San Jose, CA

Must Haves

15+ years of experience in system administration and engineering.

Minimum five years overall experience in technical roles supporting GPU Infrastructure setup using InfiniBand

Experience with interconnections between InfiniBand & GPU's

Experience with GPU Enabled MPI's

Experience with GPU Nvidia CUDA or AMD's ROCm

Experience with; H100, AMD MI210, GPU servers in Cluster

Customer deployments and ensure on-time bring-up of GPU Servers. InfiniBand fabric bring-up, configuration, and subnet management on the IB switch

Participate in engagements with various SW and FW (BMC/SBIOS/OS/drivers etc.) teams to develop best-in-class practices and tools; you will be analyzing, debugging, and resolving critical firmware and software issues for the workload performance at scale

Provide engineering solutions to enable large-scale performance strategies for performance for Datacenter GPU Computing products and software stacks, ensure technical relationships with internal and external engineering teams, and assist systems engineers in building creative solutions

Strong knowledge of Linux operating systems and networking and security concepts.

Document and drive acceptance and qualification test plans, procedures, and reports

Requirements Accelerate strategic customer deployments and ensure on-time bring-up and deployment of HPC infrastructure

Development and implementation of server and rack-level telemetry aspects, collaborate and establish continuous improvements in our design flows

Recent experience in critical data center technologies such as server architectures, software containers, job schedulers, and parallel computing. Deployment and operation of large-scale systems; resilient system design; and clustering of computing resources

cluster management for HPC and actively connect with management regarding any problems with the equipment and propose a resolution

Establish and maintain IT infrastructure and procedures for customer-facing and internal systems

Actively establish the technical relationship with our customer's engineers, management, and architects at focus accounts

Create and develop test plans for new features on each product. Recommend improvements to enable automated scripting for testing and archiving of results. Develop HPC computing strategies for cloud-based computing, GPU-accelerated computing, etc.

Provide remote cluster support to large environments, including scalability/flexibility and troubleshooting end-user issues involving job submission, runtime, and resource access.

InfiniBand fabric configuration and administration on Red hat/Centos/Linux experience in configuring PKeys and troubleshooting the end-to-end InfiniBand environment

InfiniBand fabric bring-up, configuration, subnet management, and monitoring on the IB switch and client side for multi-tenancy setup, understanding of IPoIB communication modes

Performance comparison of the InfiniBand network with cluster interconnects and debugging the InfiniBand performance-related issues

Automate configuration management, software updates, and system availability maintenance and monitoring using modern DevOps tools (Ansible, Gitlab, etc.)

Be a technical specialist on GPU computing and networking products, directly supporting GPU customers

Direct experience and strong knowledge of parallel programming, GPU CUDA/ROCm development, and applications.

Actively partner with the R&D teams delivering services to our infrastructure to gather their service requirements to live within this infrastructure.

Automate repetitive tasks and implement custom solutions using scripting/programming languages such as bash or python

Configure and troubleshoot a heterogeneous (QDR, FDR, EDR) InfiniBand network and associated subnet manager

Experience with High-performance computer interconnects (e.g. 10 and 40 Gigabit Ethernet, InfiniBand)

Able to move 50+ pounds

#LI-MA1

The annual salary range for California is $133,000 to $247,000. You may also be eligible to receive incentive compensation: bonus, equity, and benefits. Sales positions generally offer a competitive On Target Earnings (OTE) incentive compensation structure. Please note that the salary range is a guideline and compensation may vary based on factors such as qualifications, skill level, competencies and work location. Our benefits programs include: paid vacation and paid holidays, 401(k) plan with employer match, employee stock purchase plan, a variety of medical, dental and vision plan options, and more.

We're doing work that matters. Help us solve what others can't.

#J-18808-Ljbffr

Name	Expiration	Description
ATTBCookie*	2 years	These cookies are used to remember a user’s choice about cookies on thebigjobsite.com. Where users have previously indicated a preference, that user’s preference will be stored in these cookies.
last-search search redirect-stage original-keyword	1 day Session 1 hour 1 hour	These cookies are used by thebigjobsite.com to pass search data between our own pages.
datadome	1 year	DataDome is a cybersecurity solution to detect bot activity
jjap	1 days	Used to track if you have seen the Job Alerts prompt. Job Alerts is a service you can subscribe to to receive information about new jobs.

What job

...and where?

Similar Jobs