Senior Cloud Platform Software Engineer
Santa Clara, CA, United States
NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern deep learning — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as “the AI computing company.” We are looking to grow our company and establish teams with the most thoughtful people in the world.NVIDIA DGX™ Cloud is an AI-training-as-a-service platform, offering a serverless experience for enterprise developers that’s optimized for generative AI.
DGX Cloud integrates NVIDIA DGX infrastructure and technology in large-scale, multi-node clusters. DGX Cloud offers accelerated data science libraries, optimized frameworks, and pretrained models that give developers a faster path to production-ready models. DGX Cloud is the Leading Platform for AI Development, enables building Custom Generative AI Models, and used in many industries that need AI.Join us at the forefront of technological advancement.
What you’ll be doing:
Define, architect and deliver Server Platform Software for DGX Cloud which is AI infrastructure using GPUs solution from Nvidia. The DGX Cloud solutions have a mix of compute and switching nodes that are interconnected using NVIDIA’s NVLink technology.
Work with customers, product management and other architects to understand server software and firmware requirements for DGX Cloud. These solutions should seamlessly support any sized DGX Cloud.
You will be doing detailed requirements gathering, architecture, design, and implementation of platform software for DGX cloud. You will write detailed specifications for the same and work with various ODMs to get the system delivered as per specification. This would also require writing collaterals, user guides, troubleshooting guides, and tooling aspects needed for DGX Cloud.
You will own bring up and deploy various servers and switches in DGX cloud. Actively debug all bring up and deployment issues.
Contribute to all phases of DGX Cloud from architecture, and design, through implementation, debugging, testing and early customer support.
What we need to see:
8+ years of relevant experience building cluster deployment management solutions, server bring up and server firmware with BS, MS, or PhD in EE/CS or related field of education or equivalent experience
Proven record of delivering quality server, hands on with cluster deployment and management solutions for large scale out solutions in data centers.
Strong knowledge of server manageability, bring up and deployment in data centers. Solid understanding of SBIOS, BMC and OS for x86 and ARM servers
Experience working with ODM/OEMs to deliver quality servers.
Strong and demonstrable skill in python, C/C++, and shell scripting.
Experience programming and debugging skills for cluster platforms.
Experience in SCM (e.g., Git, Perforce) and project management tools like Jira.
You should possess excellent written and oral communication skills, excellent work ethics, a deep sense of teamwork, love to produce quality work and commitment to finish your tasks every single day. You are a self-starter who loves to find creative solutions to complicated problems and hands on with coding.
Ways to stand out from the crowd:
Worked on cluster deployment and bring up projects. Hands on with x86 or ARM system architecture.
Are familiar with processor microarchitecture such as caches, pipelining, memory hierarchy, and instruction set architecture (ISA).
Experience with code coverage and static analysis tools.
NVIDIA is considered one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you're creative and autonomous, we want to hear from you!
The base salary range is 176,000 USD - 333,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.