Create Email Alert

Email Alert for

ⓘ There was an unexpected error processing your request.

Please refresh the page and try again.

If the problem persists, please contact us with your issue.

Email address is already registered

You can always manage your preferences and update your interests to ensure you receive the most relevant opportunities.

Would you like to [visit your alert settings] now?

Success! You're now signed up for Job Alerts

Get ready to discover your next great opportunity.

Similar Jobs

  • SiFive

    Principal Software Architect

    Santa Clara, CA, United States

    • Ending Soon

    About SiFive As the pioneers who introduced RISC-V to the world, SiFive is transforming the future of compute by bringing the limitless potential of RISC-V to the highest performance and most data-intensive applications in the world. SiFive's unrivaled compute platforms are continuing to enable leading technology companies around the world to inno

    Job Source: SiFive
  • SiFive, Inc.

    Principal Software Architect

    Santa Clara, CA, United States

    • Ending Soon

    As the pioneers who introduced RISC-V to the world, SiFive is transforming the future of compute by bringing the limitless potential of RISC-V to the highest performance and most data-intensive applications in the world. SiFive’s unrivaled compute platforms are continuing to enable leading technology companies around the world to innovate, optimize

    Job Source: SiFive, Inc.
  • F. Hoffmann-La Roche AG

    Principal Software Architect

    Santa Clara, CA, United States

    • Ending Soon

    Roche fosters diversity, equity and inclusion, representing the communities we serve. When dealing with healthcare on a global scale, diversity is an essential ingredient to success. We believe that inclusion is key to understanding people's varied healthcare needs. Together, we embrace individuality and share a passion for exceptional care. Join R

    Job Source: F. Hoffmann-La Roche AG
  • Microsoft Corporation

    Principal Software Architect

    Mountain View, CA, United States

    • Ending Soon

    Imagine, building the world’s computer that provides unprecedented reliability and unlimited scale, an organization that empowers individuals to achieve their best and treasure grass root innovation, technology that spans machine learning, networking, operating systems, User Experience (UX), security, and a solution that enables millions of custome

    Job Source: Microsoft Corporation
  • ASML

    Principal Software Architect

    San Jose, CA, United States

    • Ending Soon

    Location San Jose, US Team Design Engineering and Architecture Work experience 10-15 years, 16+ years Educational background Physics, Computer Science, Electrical Engineering, Mechatronics, Other technical backgrounds Technical field Software Travel 10% Programming languages C#, Python Workplace type Hybrid Fulltime/parttime Full t

    Job Source: ASML
  • AMD

    Principal AI/ML Software Architect

    San Jose, CA, United States

    • Ending Soon

    WHAT YOU DO AT AMD CHANGES EVERYTHING We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinn

    Job Source: AMD
  • Advanced Micro Devices , Inc.

    Principal AI/ML Software Architect

    San Jose, CA, United States

    • Ending Soon

    WHAT YOU DO AT AMD CHANGES EVERYTHING We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences - the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpin

    Job Source: Advanced Micro Devices , Inc.
  • Salesforce

    Software Engineering Principal Architect

    Palo Alto, CA, United States

    • Ending Soon

    To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts. Job Category Software Engineering Job Details About Salesforce We’re Salesforce, the Customer Company, inspiring the future of business with AI+ Data +CRM. Leading with our core values, we help compani

    Job Source: Salesforce

Principal Software Architect

Santa Clara, CA, United States

We are now looking for a Principal Software Architect for AI and HPC.

At NVIDIA, we are advancing the frontiers of AI capabilities. We seek an expert in high-performance computing and AI to design and develop software resiliency features for training AI models on the world’s most powerful and largest supercomputers.

In this role, you will outline mission requirements for ultra large-scale AI supercomputers, thoroughly investigate and evaluate RAS feature designs, establish software requirements and evaluation metrics, and oversee the complete implementation of RAS features in software. As a leader in HPC and AI software development, you will interact with multiple teams across the organization. Your responsibilities include conducting regular reviews and check-ins with execution teams, ensuring the timely delivery of essential RAS software features such as checkpoint-recovery logic, error detection and attribution, error containment, SDC detection, and other related RAS elements. Leading cross-organizational efforts among various stakeholders and teams, you will coordinate priorities with senior leadership, provide timely updates, and ensure adequate resourcing for the projects.

What You'll Be Doing:

Collaborate with both internal and external customers and partners to define innovative Reliability, Availability, and Serviceability (RAS) requirements and objectives for present and future AI supercomputing products.

Oversee and guide the development of RAS features across the entire AI stack, encompassing aspects from job-level scheduling and AI application frameworks (such as PyTorch), down to driver-level and hardware health monitoring on GPUs.

Develop and maintain comprehensive software roadmaps, ensuring alignment with diverse engineering teams and synchronizing with engineering and product leadership for strategic coherence.

Drive successful implementation and execution of RAS features in software, with demonstrable improvements in end-to-end metrics such as availability during large-scale training runs.

What We Need to See:

A Master's or Ph.D. in Computer Science, Electrical or Computer Engineering from a reputed university, or equivalent professional experience.

15+ years of industry experience in systems architecture or related fields, demonstrating a deep understanding of system complexities.

Proven ability to work and communicate effectively in a collaborative environment, bridging multiple engineering disciplines.

At least 5 years of hands-on experience in software development, preferably in high-complexity projects involving HPC or AI.

Ways to Stand Out From the Crowd:

Demonstrated experience with large-scale AI supercomputing applications, particularly in training and inference stages.

In-depth knowledge of the requirements for large-scale AI workload training and inference.

A strong passion for and experience in developing system architectures tailored for AI applications, encompassing CPU, GPU, memory, storage, and networking.

Hands-on involvement in the entire lifecycle – from design to deployment – of large-scale High-Performance Computing (HPC) systems.

Practical experience in adopting and implementing HPC software development practices in large-scale system environments.

As NVIDIA makes inroads into the Datacenter business, our team plays a central role in getting the most out of our exponentially growing datacenter deployments as well as establishing a data-driven approach to hardware design and system software development. We collaborate with a broad cross section of teams at Nvidia ranging from DL research teams to CUDA Kernel and DL Framework development teams, to Silicon Architecture Teams. NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you're creative and autonomous, we want to hear from you!

The base salary range is 272,000 USD - 419,750 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Apply

Create Email Alert

Create Email Alert

Email Alert for Principal Software Architect jobs in Santa Clara, CA, United States

ⓘ There was an unexpected error processing your request.

Please refresh the page and try again.

If the problem persists, please contact us with your issue.

Email address is already registered

You can always manage your preferences and update your interests to ensure you receive the most relevant opportunities.

Would you like to [visit your alert settings] now?

Success! You're now signed up for Job Alerts

Get ready to discover your next great opportunity.