Create Email Alert

Email Alert for

ⓘ There was an unexpected error processing your request.

Please refresh the page and try again.

If the problem persists, please contact us with your issue.

Email address is already registered

You can always manage your preferences and update your interests to ensure you receive the most relevant opportunities.

Success! You're now signed up for Job Alerts

Get ready to discover your next great opportunity.

Similar Jobs

GenomeWeb LLC

Senior Software Engineer, C++ / HPC System

Menlo Park, CA, United States
- Ending Soon
Senior Software Engineer, C++ / HPC System Job Description At Pacific Biosciences, our R&D team is committed to developing innovative products that enable scientists to excel in a wide variety of life science research fields, including human biomedical, plant and animal sciences, and microbiology and infectious disease. Our unique Single Molecul
Job Source: GenomeWeb LLC
WeRide.ai

HPC System Engineer

San Jose, CA, United States
WeRide is a smart mobility start-up whose mission is to transform mobility with autonomous driving. We are committed to build better transportation experience that's safe, efficient, affordable and joyful. We have an elite team of entrepreneurs and technologists who share the same passion and pursue continuous excellence in their work. What you wi
Job Source: WeRide.ai
d-Matrix

AI / ML System Software Engineer, Senior

Santa Clara, CA, United States
- Ending Soon
d-Matrix has fundamentally changed the physics of memory-compute integration with our digital in-memory compute (DIMC) engine. The “holy grail” of AI compute has been to break through the memory wall to minimize data movements. We’ve achieved this with a first-of-its-kind DIMC engine. Having secured over $154M, $110M in our Series B offering, d-Mat
Job Source: d-Matrix
TikTok

Software Engineer, ML System Architecture

San Jose, CA, United States
- Ending Soon
Responsibilities TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo. At TikTok, our people are humble, intelligent, compassionate and creative. We create to in
Job Source: TikTok
d-Matrix

AI / ML System Software Engineer, Principal

Santa Clara, CA, United States
- Ending Soon
d-Matrix has fundamentally changed the physics of memory-compute integration with our digital in-memory compute (DIMC) engine. The “holy grail” of AI compute has been to break through the memory wall to minimize data movements. We’ve achieved this with a first-of-its-kind DIMC engine. Having secured over $154M, $110M in our Series B offering, d-Mat
Job Source: d-Matrix
Meta

AI/HPC Systems Performance Engineer

Menlo Park, CA, United States
Meta's AI Training and Inference Infrastructure is growing exponentially to support ever increasing use cases of AI. This results in a dramatic scaling challenge that our engineers have to deal with on a daily basis. We need to build and evolve our network infrastructure that connects myriads of training accelerators like GPUs together. In addition
Job Source: Meta
Meta

Software Engineer (Technical Leadership)_

Menlo Park
**Summary:** Meta is seeking an AI Software Engineer to join the Co-design team. The ideal candidate will have industry experience working on AI Infrastructure related topics. The position will involve taking these skills and applying them to solve for some of the most crucial & exciting problems that exist on the web. We are hiring in multiple loc
Job Source: Meta
Meta

Software Engineer, SystemML - Scaling / Performance

Menlo Park, CA, United States
In this role, you will be a member of the Network.AI Software team and part of the bigger DC networking organization. The team develops and owns the software stack around NCCL (NVIDIA Collective Communications Library), which enables multi-GPU and multi-node data communication through HPC-style collectives. NCCL has been integrated into PyTorch and
Job Source: Meta

Software Engineer, Systems ML - HPC Specialist_

Menlo Park

**Summary:**

Meta is seeking an AI Software Engineer to join our Research & Development teams. The ideal candidate will have industry experience working on AI Infrastructure related topics. The position will involve taking these skills and applying them to solve for some of the most crucial & exciting problems that exist on the web.Some aspects of this role as an HPC specialist may include authoring components such as cuBLAS, cuDNN, AITemplate, FlashAttention and development of runtimes such as LLM disaggregated runtime. HPC specialists spend time optimizing the program to reduce the accelerators idle time. They also develop tools to debug (cuda-gdb), profiler utilizing the accelerated computing hardware (such as PE’s/SFU etc in MTIA or Transformer engine in H100). They are experts in systems who are able to design, debug and accelerate AI workloads from single-node scale up to multi-node scale out distributed systems. They also are able to influence the next generation of Silicon architectures (such as Tensor Core in V100. Transformer Engine in H100) based on the evolving AI workload needs.We are hiring in multiple locations.

**Required Skills:**

Software Engineer, Systems ML - HPC Specialist Responsibilities:

1. Apply relevant AI and machine learning techniques to build & optimize our intelligent systems that improve Metas products and experiences

2. Develop custom/novel architectures, define use cases, and develop methodology & benchmarks to evaluate different approaches

3. Apply in depth knowledge of how the machine learning system interacts with the other systems around it

4. Assist in goal setting related to project impact, AI system design, and ML excellence

**Minimum Qualifications:**

Minimum Qualifications:

5. Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience.

6. 2+ years of experience in HPC and parallel computing.

7. Proficiency in GPU programming using CUDA and familiarity with CUDA libraries (cuBLAS, cuDNN, etc.).

8. Proven track record of leading successful HPC projects.

9. Proven technical expertise in HPC architectures and technologies.

**Preferred Qualifications:**

Preferred Qualifications:

10. PhD in Computer Science, Computer Engineering, or relevant technical field.

11. Experience developing AI algorithms or AI-System infrastructure in C/C++ or Python.

12. Experience developing AI Compiler (TorchInductor in PyTorch 2.0).

**Public Compensation:**

$70.67/hour to $208,000/year + bonus + equity + benefits

**Industry:** Internet

**Equal Opportunity:**

Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment.

Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at [email protected].

Name	Expiration	Description
ATTBCookie*	2 years	These cookies are used to remember a user’s choice about cookies on thebigjobsite.com. Where users have previously indicated a preference, that user’s preference will be stored in these cookies.
last-search search redirect-stage original-keyword	1 day Session 1 hour 1 hour	These cookies are used by thebigjobsite.com to pass search data between our own pages.
datadome	1 year	DataDome is a cybersecurity solution to detect bot activity
jjap	1 days	Used to track if you have seen the Job Alerts prompt. Job Alerts is a service you can subscribe to to receive information about new jobs.

What job

...and where?

Similar Jobs

Senior Software Engineer, C++ / HPC System

HPC System Engineer

AI / ML System Software Engineer, Senior

Software Engineer, ML System Architecture

AI / ML System Software Engineer, Principal

AI/HPC Systems Performance Engineer

Software Engineer (Technical Leadership)_

Software Engineer, SystemML - Scaling / Performance

Software Engineer, Systems ML - HPC Specialist_

Share this job

Create Email Alert