Create Email Alert

Email Alert for

ⓘ There was an unexpected error processing your request.

Please refresh the page and try again.

If the problem persists, please contact us with your issue.

Email address is already registered

You can always manage your preferences and update your interests to ensure you receive the most relevant opportunities.

Success! You're now signed up for Job Alerts

Get ready to discover your next great opportunity.

Similar Jobs

NVIDIA

Senior AI-HPC Storage Engineer

Santa Clara, CA, United States
- Ending Soon
NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing. NVIDIA is a “learning machine” that constantly evolves by
Job Source: NVIDIA
Zealogics

HPC engineer

San Jose, CA, United States
Job Responsibilities Candidates should have good domain knowledge in High-Performance Computing, script language(Shell, Python), Linux administrator, operating systems (Linux, Windows), computer network Distributed file systems (Lustre/NFS), virtualization and containerization related experience is a plus Configuration and maintenance of the HPC co
Job Source: Zealogics
NVIDIA

Senior AI-HPC Storage Engineer

Holy City, CA, United States
NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI - the next era of computing. NVIDIA is a "learning machine" that constantly evolves by
Job Source: NVIDIA
NVIDIA Corporation

Senior HPC Performance Engineer

Santa Clara, CA, United States
- Ending Soon
Senior HPC Performance Engineer page is loaded Senior HPC Performance Engineer Apply locations US, CA, Santa Clara time type Full time posted on Posted 2 Days Ago job requisition id JR1977468 NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High Performance Computing and Visua
Job Source: NVIDIA Corporation
NVIDIA

Senior DevOps and Automation Engineer - HPC

Santa Clara, CA, United States
- Ending Soon
NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers
Job Source: NVIDIA
GenomeWeb LLC

Senior Software Engineer, C++ / HPC System

Menlo Park, CA, United States
- Ending Soon
Senior Software Engineer, C++ / HPC System Job Description At Pacific Biosciences, our R&D team is committed to developing innovative products that enable scientists to excel in a wide variety of life science research fields, including human biomedical, plant and animal sciences, and microbiology and infectious disease. Our unique Single Molecul
Job Source: GenomeWeb LLC
Arc Institute

HPC Infrastructure Engineer

Palo Alto, CA, United States
About Arc Institute The Arc Institute is a new scientific institution that conducts curiosity-driven basic science and technology development to understand and treat complex human diseases. Headquartered in Palo Alto, California, Arc is an independent research organization founded on the belief that many important research programs will be enabled
Job Source: Arc Institute
WeRide.ai

HPC System Engineer

San Jose, CA, United States
WeRide is a smart mobility start-up whose mission is to transform mobility with autonomous driving. We are committed to build better transportation experience that's safe, efficient, affordable and joyful. We have an elite team of entrepreneurs and technologists who share the same passion and pursue continuous excellence in their work. What you wi
Job Source: WeRide.ai

Senior HPC Engineer

Mountain View, CA, United States

Job Description

ASRC Federal is searching for a Senior HPC Engineer to support Inuteq LLC which this role is fully telework

ASRC Federal InuTeq provides High Performance Computing services throughout the HPC lifecycle for computational requirements, architecture, acquisition, and operations to federal government customers. Our employees embrace innovation and are committed to a culture of continuous, standards-driven process improvement, and assimilation of industry best practices. We are seeking to fill a role that primarily provides development for Supercomputing Batch Scheduling with Supercomputing Systems Administration secondary support for our NASA NACS High Performance Computing (HPC) contract.

Summary : The successful candidate will be an active supporting member of the ASRC Federal team reporting directly to the Manager of the Application Performance and Productivity (APP) group and matrixed directly to the Supercomputing Systems Team Manager.

An individual at this skill level should have demonstrated extensive experience working with common HPC batch schedulers e.g. (PBS, Slurm, or Moab/Torque) while contributing to the support of users of HPC resources on the various issues they might have getting applications to run efficiently. This individual should demonstrate experience installing, maintaining, and upgrading HPC systems. The individual, along with the entire HPC team, will be engaged in the day-to-day operations and support of the HPC resources. Activities may include system patching, OS upgrades, deploying new systems, writing scripts, and troubleshooting system issues on the HPC system. The ability to interact with users to determine symptoms, and then reproduce their issues to isolate the causes is critical skills for this work. There will also be activities in testing, benchmarking, user tool scripting, and analyzing trouble tickets to find patterns indicating system or user education issues.

Duties and Responsibilities:

Designs, deploys and maintains HPC clusters with over 2000+ nodes with InfiniBand, 100+ petabytes of data storage in production.

Write and shepherd scalable feature designs through the entire software development process, from requirements and use cases to release

Designs and develops scripts for system administration, monitoring and usage reporting.

Modify existing software to correct errors and/or improve performance

Designs and develops scripts for system regression test and performance (file systems (Luster), scheduler (PBS), interconnect (HDR/NDR, Slingshot, ), high availability, etc.).

Troubleshoots, isolates and resolves application, system and other technical problems (hardware, software, and network).

Understands research use cases, researches and deploys new technologies, defining cost, performance and other trade-offs.

Manages and maintains tools for configuration management (HPCM, Ansible GIT), resource management, scheduling and all necessary aspects of HPC in accordance with best practices.

Researches, deploys and manages networking and security infrastructure, including development of policies and procedures.

Assists in developing and writing proposals and publications.

Creates and provides clear documentation.

Mentoring junior staff and cross training peers

After hours/weekend support as required

Moderate Supercomputing System Administration that contributes to:

Day-to-day operations of the Linux HPC clusters and storage systems

Proactive monitoring, analyze, and correct system issues

Development of scripts to automate repetitive tasks or tools to enhance support of the HPC systems

System performance analysis and tuning

Building, installing, and supporting user-requested software

Supporting evaluation and assessment of new HPC technology

Resolving user report issues and manage support tickets requests in Remedy

Requirements

Requirements: Bachelor’s degree in computer science or related field

Strong computer science background with in-depth systems-level knowledge in operating systems and networking

A minimum of 10 years experience of administration of HPC systems and scheduling software (PBS, Slurm, or Moab/Torque)

A minimum of 10 years of experience of systems programming in heterogeneous, multi-platform HPC environments

Strong ability to analyze, debug and maintain the integrity of an existing code base

Demonstrated equivalence of 5 years of Linux/UNIX user support experience and hands-on experience with administration of Linux systems

Experience working with HPC applications and proficiency in at least C, C++, or Fortran

Superior scripting skills and excellent attention to detail; proficiency in at least Python, Perl, or Bash

Strong ability to interact with customers to understand needs, elicit requirements, and get feedback on prototype solutions

Excellent communication and people skills; excellent time management and organizational skills

Experience with system configuration management tools e.g. , puppet, chef, ansible

Experience with revision control software e.g. CVS, SVN, Git

Track record of delivering commercial quality software on schedule with excellent quality through multiple release cycles

Proficiency at technical writing

Preferred Skills (Requesting Manager Defines): Proficiency with analysis and problem-solving skills for debugging and optimization of applications

Familiarity/proficiency with OpenMP and Message Passing Interface (MPI) programming

Experience with Lustre, and InfiniBand

Experience with cloud technologies (AWS, Azure, GCP), OpenStack or Kubernetes is a plus

EEO Statement

ASRC Federal and its Subsidiaries are Equal Opportunity / Affirmative Action employers. All qualified applicants will receive consideration for employment without regard to race, gender, color, age, sexual orientation, gender identification, national origin, religion, marital status, ancestry, citizenship, disability, protected veteran status, or any other factor prohibited by applicable law.

Name	Expiration	Description
ATTBCookie*	2 years	These cookies are used to remember a user’s choice about cookies on thebigjobsite.com. Where users have previously indicated a preference, that user’s preference will be stored in these cookies.
last-search search redirect-stage original-keyword	1 day Session 1 hour 1 hour	These cookies are used by thebigjobsite.com to pass search data between our own pages.
datadome	1 year	DataDome is a cybersecurity solution to detect bot activity
jjap	1 days	Used to track if you have seen the Job Alerts prompt. Job Alerts is a service you can subscribe to to receive information about new jobs.

What job

...and where?

Similar Jobs

Senior AI-HPC Storage Engineer

HPC engineer

Senior AI-HPC Storage Engineer

Senior HPC Performance Engineer

Senior DevOps and Automation Engineer - HPC

Senior Software Engineer, C++ / HPC System

HPC Infrastructure Engineer

HPC System Engineer

Senior HPC Engineer

Share this job

Create Email Alert