Create Email Alert

Email Alert for

ⓘ There was an unexpected error processing your request.

Please refresh the page and try again.

If the problem persists, please contact us with your issue.

Email address is already registered

You can always manage your preferences and update your interests to ensure you receive the most relevant opportunities.

Would you like to [visit your alert settings] now?

Success! You're now signed up for Job Alerts

Get ready to discover your next great opportunity.

Similar Jobs

  • Google

    Senior Software Engineer, Infrastructure, Google Cloud Compute Infrastructure

    Sunnyvale, CA, United States

    Minimum qualifications: Bachelor’s degree or equivalent practical experience. 5 years of experience with software development in C++, and with data structures/algorithms. 3 years of experience testing, maintaining, or launching software products, and 1 year of experience with software design and architecture. 3 years of experience developing large

    Job Source: Google
  • Google

    Software Engineer III, Infrastructure, Google Cloud Compute Infrastructure

    Sunnyvale, CA, United States

    • Ending Soon

    Minimum qualifications: Bachelor’s degree or equivalent practical experience. 2 years of experience with software development in one or more programming languages, or 1 year of experience with an advanced degree in an industry setting. 2 years of experience with data structures or algorithms in either an academic or industry setting. 2 years of ex

    Job Source: Google
  • Advanced Micro Devices, Inc

    DevOps Director of Compute Infrastructure

    San Jose, CA, United States

    Overview: WHAT YOU DO AT AMD CHANGES EVERYTHING We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded

    Job Source: Advanced Micro Devices, Inc
  • Google Inc.

    Software Engineer III, Google Cloud Compute Infrastructure

    Sunnyvale, CA, United States

    • Ending Soon

    Apply info_outline info_outline X Info Note: By applying to this position you will have an opportunity to share your preferred working location from the following: Sunnyvale, CA, USA; Seattle, WA, USA. Note: By applying to this position you will have an opportunity to share your preferred working location from the following: Sunnyvale, CA, USA

    Job Source: Google Inc.
  • Google

    Software Engineer III, Infrastructure, Google Cloud Compute

    Sunnyvale, CA, United States

    Minimum qualifications: Bachelor’s degree or equivalent practical experience. 2 years of experience with software development in one or more programming languages, or 1 year of experience with an advanced degree in an industry setting. 2 years of experience with data structures or algorithms in either an academic or industry setting. 2 years of ex

    Job Source: Google
  • Google Cloud - Minnesota

    Senior Software Engineer, Google Cloud Compute Infrastructure

    Sunnyvale, CA, United States

    • Ending Soon

    info_outline XInfo Note: By applying to this position you will have an opportunity to share your preferred working location from the following: Sunnyvale, CA, USA; Seattle, WA, USA.Note: By applying to this position you will have an opportunity to share your preferred working location from the following: Sunnyvale, CA, USA; Seattle, WA, USA . Min

    Job Source: Google Cloud - Minnesota
  • Google Cloud - Minnesota

    Software Engineer III, Google Cloud Compute Infrastructure

    Sunnyvale, CA, United States

    • Ending Soon

    info_outline XInfo Note: By applying to this position you will have an opportunity to share your preferred working location from the following: Sunnyvale, CA, USA; Seattle, WA, USA; Kirkland, WA, USA.Note: By applying to this position you will have an opportunity to share your preferred working location from the following: Sunnyvale, CA, USA; Seat

    Job Source: Google Cloud - Minnesota
  • Google

    Software Engineer III, Google Cloud Compute Infrastructure

    Sunnyvale, CA, United States

    Minimum qualifications: Bachelor’s degree or equivalent practical experience. 2 years of experience with software development in one or more programming languages, or 1 year of experience with an advanced degree in an industry setting. 2 years of experience with data structures or algorithms in either an academic or industry setting. Preferred qu

    Job Source: Google

AI Compute Infrastructure Engineer

Sunnyvale, CA, United States

Cerebras Systems has pioneered a groundbreaking chip and system that revolutionizes deep learning applications. Our system empowers ML researchers to achieve unprecedented speeds in training and inference workloads, propelling AI innovation to new horizons.

Condor Galaxy 1 (CG-1), a supercomputer set to revolutionize the world of artificial intelligence. With an astounding processing power of 4 ExaFLOPs, 54 million cores, and a cutting-edge 64-node architecture, the CG-1 is the first milestone of a larger project that will redefine the possibilities of AI.

The successful completion and deployment of the CG-1, the first of nine powerful supercomputers, is a significant achievement for Cerebras. As we enter phase 2 of the project with CG2, we are taking a bold step towards creating a network of interconnected supercomputers that will collectively deliver a mind-boggling 36 ExaFLOPs of AI compute power upon completion.

Responsibilities

Operate and manage multiple Advanced ML accelerator solutions from Cerebras Systems - Condor Galaxy

Maximize the available compute capacity - thereby providing high uptime at max performance for the CG deployments

Monitor and oversee CG health to ensure stability and security

Manage and customize k8s, cluster, cloud features on CGs

Provide solutions to ML users using tools and components available in a vast linux-based ecosystem - compute, storage, networking.

Configure, deploy and debug container-based services on orchestration platforms like Kubernetes.

Provide 24/7 monitoring, support – using automated tools and hands-on manual troubleshooting

Training and Inference in data center, LLM (50b to 500b parameter models),  multi-modal, mistral etc.

Adapt and make progress in a fast-paced and constantly evolving environment.

Document processes and procedures needed to efficiently operate CGs.

Requirements

BS CS/EE, MS CS/EE

5+ years relevant experience in managing compute infrastructure

Hands-on technical expert

Proficiency with Python and other common programming languages

Demonstrated high impact in a variety of products and roles

Experience in container orchestration platforms like Kubernetes and SLURM

Experience with ML frameworks like PyTorch, Tensorflow, etc.

Strong knowledge and demonstrated experience with:

Linux based compute systems, virtualization, docker containers

Scheduling and orchestration applications like SLURM, Kubernetes

Good understanding of cloud infrastructure design, deployment and maintenance

Knowledge of technologies like Ethernet, RoCE, TCP/IP, etc. is desired

Past experience with cross-functional team projects

Past experience and interactions with high-value customers

Should have a proven track record to own and drive challenges to completion

Location

SF Bay Area

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection  point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU

Publish and open source their cutting-edge AI research

Work on one of the fastest AI supercomputers in the world

Enjoy job stability with startup vitality

Our simple, non-corporate work culture that respects individual beliefs

Read our blog: Five Reasons to Join Cerebras in 2024.

Apply today and become part of the forefront of groundbreaking advancements in AI.

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Apply

Create Email Alert

Create Email Alert

Email Alert for AI Compute Infrastructure Engineer jobs in Sunnyvale, CA, United States

ⓘ There was an unexpected error processing your request.

Please refresh the page and try again.

If the problem persists, please contact us with your issue.

Email address is already registered

You can always manage your preferences and update your interests to ensure you receive the most relevant opportunities.

Would you like to [visit your alert settings] now?

Success! You're now signed up for Job Alerts

Get ready to discover your next great opportunity.