Create Email Alert

Email Alert for

ⓘ There was an unexpected error processing your request.

Please refresh the page and try again.

If the problem persists, please contact us with your issue.

Email address is already registered

You can always manage your preferences and update your interests to ensure you receive the most relevant opportunities.

Would you like to [visit your alert settings] now?

Success! You're now signed up for Job Alerts

Get ready to discover your next great opportunity.

Similar Jobs

  • Anthropic

    Site Reliability Engineer

    San Francisco, CA, United States

    We are looking for a Site Reliability Engineer who will ensure the high availability and performance of our Kubernetes clusters that power machine learning research and services. About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as

    Job Source: Anthropic
  • Appspace

    Site Reliability Engineer

    San Francisco, CA, United States

    • Ending Soon

    At Appspace, we’re passionate about creating better work experiences for people everywhere, and we’re looking for people that feel the same way. Our global office locations and flexible work culture help you work wherever and however you’re at your best. Plus, we take the time to help you enjoy your work, build lasting connections, and grow your ro

    Job Source: Appspace
  • Wasmer

    Site Reliability Engineer

    San Francisco, CA, United States

    • Ending Soon

    [Full Time] Site Reliability Engineer at Wasmer (United States) | BEAMSTART Jobs Site Reliability Engineer Wasmer United States Date Posted 25 Mar, 2023 Work Location San Francisco, United States Salary Offered Not Specified Job Type Full Time Experience Required 1+ years Remote Work Yes Stock Options No Vacancies 1 available Role: Senior Si

    Job Source: Wasmer
  • Dorahacks

    Site Reliability Engineer

    San Francisco, CA, United States

    • Ending Soon

    About DoraHacks DoraHacks is a global hackathon organizer and one of the world's most active developer incentive platforms. It creates a global hacker movement in blockchain/Web3, quantum computing, space tech, and other frontier technology. DoraHacks provides a wide range of toolkits to help hackers around the world team up and fund their ideas an

    Job Source: Dorahacks
  • Compunnel

    Site Reliability Engineer

    San Francisco, CA, United States

    Direct client Location: San Francisco, CA (SFO bay area) Role: Site Reliability Engineer (DevOps) Contract to hire Required Skills: Experience in using Terraform to manage AWS Programmable Infrastructures Must have architected and implemented the Cloud Infrastructure Automation scripts to create and maintain various target environments like Dev, St

    Job Source: Compunnel
  • Withorb

    Site Reliability Engineer

    San Francisco, CA, United States

    • Ending Soon

    Mission Orb is on an ambitious mission to provide every business with the infrastructure to unlock their revenue. Best-in class businesses find ways to effectively align their monetization to product usage—whether that's through seats, consumption, feature limits, or usage-based tiers. Orb brings that opportunity to every software company. We are

    Job Source: Withorb
  • Gusto

    Site Reliability Engineer

    San Francisco, CA, United States

    • Ending Soon

    About Gusto Gusto is a modern, online people platform that helps small businesses take care of their teams. On top of full-service payroll, Gusto offers health insurance, 401(k)s, expert HR, and team management tools. Today, Gusto offices in Denver, San Francisco, and New York serve more than 300,000 businesses nationwide. Our mission is to creat

    Job Source: Gusto
  • Resource Informatics Group

    Site Reliability Engineer

    San Francisco, CA, United States

    Job Title: Site Reliability Engineer Work Location : San Francisco, CA (Hybrid after showing successful engagement) Duration: 18+ months Most important skills: 10 years of Oracle database administration experience on large production environment Database hands on skills especially around database and system troubleshooting and administration G

    Job Source: Resource Informatics Group

Site Reliability Engineer

San Francisco, CA, United States

As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a software engineer that applies sound engineering principles, operational discipline, and mature automation to our operating environments and codebase.

You specialize in systems (operating systems, storage subsystems, networking), while implementing best practices for availability, reliability and scalability, with varied interests in algorithms and distributed systems.

Requirements

7+ years of professional SRE or related experience

Bachelor's degree in Computer Science or a related field or equivalent work experience

Expert knowledge of Ansible (roles, playbooks), Terraform, and Kubernetes

Proficiency in programming/scripting languages

Direct experience in monitoring and observability practices

Advanced knowledge of cloud services

Ability to thrive in a collaborative environment involving different stakeholders and subject matter experts

Responsibilities

Be on an on-call (PagerDuty) rotation to respond to incidents that impact availability

Build and run our infrastructure with Ansible, Terraform, and Kubernetes to enable scaling to a massive number of concurrent users

Build monitoring systems to ensure the highest quality service for our customers

Design and implement operational processes (such as deployments and upgrades)

Debug production issues across all services and levels of the stack

Identify improvements for the product architecture from the reliability, performance and availability perspectives

Plan the growth of Together AI's infrastructure

About Together AI

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers and engineers in our journey in building the next generation AI infrastructure.

Compensation

We offer competitive compensation, startup equity, health insurance and other competitive benefits. The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

Equal Opportunity

Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

Please see our privacy policy at https://www.together.ai/privacy

#J-18808-Ljbffr

Apply

Create Email Alert

Create Email Alert

Email Alert for Site Reliability Engineer jobs in San Francisco, CA, United States

ⓘ There was an unexpected error processing your request.

Please refresh the page and try again.

If the problem persists, please contact us with your issue.

Email address is already registered

You can always manage your preferences and update your interests to ensure you receive the most relevant opportunities.

Would you like to [visit your alert settings] now?

Success! You're now signed up for Job Alerts

Get ready to discover your next great opportunity.