Senior Software Engineer, ML Infrastructure
San Jose, CA, United States
What You’ll Do
The AGI (Artificial General Intelligence) Computing Lab is dedicated to solving the complex system-level challenges posed by the growing demands of future AI/ML workloads. Our team is committed to designing and developing scalable platforms that can effectively handle the computational and memory requirements of these workloads while minimizing energy consumption and maximizing performance. To achieve this goal, we collaborate closely with both hardware and software engineers to identify and address the unique challenges posed by AI/ML workloads and to explore new computing abstractions that can provide a better balance between the hardware and software components of our systems. Additionally, we continuously conduct research and development in emerging technologies and trends across memory, computing, interconnect, and AI/ML, ensuring that our platforms are always equipped to handle the most demanding workloads of the future. By working together as a dedicated and passionate team, we aim to revolutionize the way AI/ML applications are deployed and executed, ultimately contributing to the advancement of AGI in an affordable and sustainable manner. Join us in our passion to shape the future of computing!
Location: Hybrid, working onsite at our office 3 days per week with the flexibility to work remotely the remainder of your time
Reports to: SVP
Req: 41882
Stay up-to-date with the latest advancements in parallel computing, distributed systems, and ML technologies, and contribute to the development of new techniques and approaches.
Analyze and profile ML workloads to identify bottlenecks and inefficiencies.
Design and implement parallel and distributed computing systems to improve the scalability and performance of ML workloads.
Optimize ML algorithms and models to reduce memory usage, improve computational efficiency, and minimize communication overhead.
Communicate effectively with stakeholders, including users, partners, and management, to ensure that the systems are delivered on time and within budget
Complete other responsibilities as assigned.
What You Bring
BS in Computer/Electrical Engineering or Computer Science with 5+ years of working experiences in silicon development or MS in Computer/Electrical Engineering or Computer Science with 3+ years of relevant working experience or PhD and 0+ years of relevant working experience preferred.
Experience with deep learning techniques and architectures.
Strong proficiency in C++, or a similar programming language.
Experience with popular ML frameworks such as TensorFlow, PyTorch, or JAX.
Experience with ML lowering infrastructure such as MLIR.
Excellent problem-solving skills and ability to think critically and creatively.
Strong analytical and problem-solving skills
Excellent communication and interpersonal skills
Ability to work independently and as part of a team
You’re inclusive, adapting your style to the situation and diverse global norms of our people.
An avid learner, you approach challenges with curiosity and resilience, seeking data to help build understanding.
You’re collaborative, building relationships, humbly offering support and openly welcoming approaches.
Innovative and creative, you proactively explore new ideas and adapt quickly to change.
#LI-MD1
#J-18808-Ljbffr