Create Email Alert

Email Alert for

ⓘ There was an unexpected error processing your request.

Please refresh the page and try again.

If the problem persists, please contact us with your issue.

Email address is already registered

You can always manage your preferences and update your interests to ensure you receive the most relevant opportunities.

Would you like to [visit your alert settings] now?

Success! You're now signed up for Job Alerts

Get ready to discover your next great opportunity.

Similar Jobs

  • Meta

    Production Systems Engineer, AI Systems_

    Menlo Park

    **Summary:** Meta is seeking a Systems Engineer to join our Release to Production (RTP) team working on AI/ML initiatives supporting large scale AI Training and Inference. Our servers and data centers are the foundation upon which our rapidly scaling infrastructure operates efficiently to deliver our innovative services. The RTP team is responsible

    Job Source: Meta
  • Meta

    Production Systems Engineer, Tooling

    Menlo Park, CA, United States

    • Ending Soon

    Meta is seeking an experienced Production Systems Engineer to join our Release to Production (RTP) team. Our servers and data centers are the foundation upon which our rapidly scaling infrastructure operates efficiently to deliver our innovative services. The RTP team is responsible for the Hardware Lifecycle of all Meta servers including pre-produ

    Job Source: Meta
  • Meta

    Production Systems Engineer, AI Systems_

    Menlo Park

    **Summary:** Meta is seeking a Systems Engineer to join our Release to Production (RTP) team working on AI/ML initiatives supporting large scale AI Training and Inference. Our servers and data centers are the foundation upon which our rapidly scaling infrastructure operates efficiently to deliver our innovative services. The RTP team is responsible

    Job Source: Meta
  • Zoox

    Systems Engineer - System Validation

    San Mateo, CA, United States

    Zoox is looking for a systems engineer to lead the systems safety verification and validation of our level 3 autonomy test platform. As a Systems Validation Engineer on the System Design and Mission Assurance (SDMA) team, you will be leading the specification and execution of systems validation plans, working closely with embedded firmware, systems

    Job Source: Zoox
  • Zoox

    Systems Engineer - System Validation

    Foster City, CA, United States

    • Ending Soon

    Foster City, CA • Full-time Systems Engineer - System Validation Zoox is looking for a systems engineer to lead the systems safety verification and validation of our level 3 autonomy test platform. As a Systems Validation Engineer on the System Design and Mission Assurance (SDMA) team, you will be leading the specification and execution of systems

    Job Source: Zoox
  • Magnit

    System Engineer

    Santa Clara, CA, United States

    Title: Systems Engineer 1 Location: Santa Clara, CA 95050 -- Onsite Duration: 12+ Months contract role Pay Rate: $40.00 - $48.00/Hour on W2. Job description: REIMBURSEMENT SPECIALIST: System Engineer - Hardware Support Overview: Roche Sequencing Solutions & Roche Molecular Lab (Santa Clara) are developing nanopore based sequencing dedicated to maki

    Job Source: Magnit
  • eTeam, Inc.

    AI Systems Engineer

    San Jose, CA, United States

    • Ending Soon

    Job Overview: We are seeking an AI Systems Engineer to join our IT compute platforms engineering team. The AI Systems Engineer is responsible for the design, development, and administration of High-Performance Computing (HPC) infrastructure, GPU clusters, and AI workload schedulers. ABOUT YOU: You have a passion for learning. You are passionate abo

    Job Source: eTeam, Inc.
  • High-Tech Professionals

    Systems Engineer

    Palo Alto, CA, United States

    Description: Seeking Systems Engineer to join a small team of top-notch developers coming from Fortune 100 companies. Should be able to work in a collaborative, team-based work environment and be self-directed toward excellence. Highly organized and ability to thrive in ambiguity. Requirement: - Mastery of the UNIX programming environment - Experie

    Job Source: High-Tech Professionals

Production Systems Engineer, Fleet AI Systems

Menlo Park, CA, United States

Meta is seeking an experienced Production Systems Engineer to join our Release to Production (RTP) team. Our servers and data centers are the foundation upon which our rapidly scaling infrastructure operates efficiently to deliver our innovative services. The RTP team is responsible for the Hardware Lifecycle of all Meta servers including pre-production hands-on system and hardware debugging and stress testing, enabling production-ready system monitoring, automated provisioning and automated remediation of issues. RTP Engineers work closely with hardware designers, system manufacturers, component vendors, capacity engineering, production engineering, Facebook services, and data center operations teams to test systems before release to our production data centers, and to track the health and lifecycle of servers in production.

Production Systems Engineer, Fleet AI Systems Responsibilities

Develop robust, industry leading practices for supporting hardware infrastructure at scale Interface with external vendors and internal hardware, mechanical, power, thermal, manufacturing and software engineers to understand system architecture to develop and execute the test suites for various architectures

Proactively create experiments and tooling to detect and diagnose hardware/firmware/software health issues

Implement remediations across software and hardware stack according to plan, while keeping a thorough procedural record and data log

Develop and publish updates on resolutions and communicate findings internally

Troubleshoot, diagnose and root cause of system failures and isolate the components/failure scenarios while working with internal & external stakeholders

Drive necessary discussion with external and internal teams on test specification and methodologies to improve test quality continuously

Minimum Qualifications

Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience.

10+ years experience in hardware systems technologies or supporting production hardware at scale

Troubleshooting and analytical experience

Knowledge of server architecture and components

Experience with Linux and scripting

Experience in changing system configurations and measuring change impact

Experience working in a matrix organization

Experience working through full life cycle for computer system products

Experience supporting AI/HPC systems, GPU or Silicon hardware, and/or related components at scale

Engineering for different server system/data center products

Preferred Qualifications

10+ years experience in Production support at scale (e.g. - 10K storage servers and over 100K HDD)

10+ years experience in full system technologies

Experience in post-production hyperscale post-production environments, solutions

Start preparing

Learn about how to prepare for your interview with our interview guide, tips, and interactive experiences.

Visit interview prep

#J-18808-Ljbffr

Apply

Create Email Alert

Create Email Alert

Email Alert for Production Systems Engineer, Fleet AI Systems jobs in Menlo Park, CA, United States

ⓘ There was an unexpected error processing your request.

Please refresh the page and try again.

If the problem persists, please contact us with your issue.

Email address is already registered

You can always manage your preferences and update your interests to ensure you receive the most relevant opportunities.

Would you like to [visit your alert settings] now?

Success! You're now signed up for Job Alerts

Get ready to discover your next great opportunity.