Create Email Alert

Email Alert for

ⓘ There was an unexpected error processing your request.

Please refresh the page and try again.

If the problem persists, please contact us with your issue.

Email address is already registered

You can always manage your preferences and update your interests to ensure you receive the most relevant opportunities.

Success! You're now signed up for Job Alerts

Get ready to discover your next great opportunity.

Similar Jobs

Pure Storage

Senior Site Reliability Engineer

Santa Clara, CA, United States
- Ending Soon
BE PART OF BUILDING THE FUTURE. What do NASA and emerging space companies have in common with COVID vaccine R&D teams or with Roblox and the Metaverse? The answer is data, -- all fast moving, fast growing industries rely on data for a competitive edge in their industries. And the most advanced companies are realizing the full data advantage by part
Job Source: Pure Storage
Hireio, Inc.

Senior Site Reliability Engineer

San Jose, CA, United States
Job Description Position Description : Location: Usa/Usa/California/Sf Bay Area, Seattle Base Salary: 187K - 280K Sponsor Visa? Yes Language Requirements: English, Mandarin (Preferred) Our Team: Site Reliability Engineering(SRE) team combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolera
Job Source: Hireio, Inc.
Sustainable Talent

Senior Site Reliability Engineer

Santa Clara, CA, United States
Join the Sustainable Talent team, supporting NVIDIA as a Senior Site Reliability Engineer supporting the Infrastructure, Planning, and Process organization. This is a W-2 full-time contract based in Santa Clara, CA, with Hybrid work options. We offer competitive pay $75 - $90/hr based on factors like experience, education, location, etc. and provid
Job Source: Sustainable Talent
HireIO Inc

Senior Site Reliability Engineer

San Jose, CA, United States
Introduction We are an all-in-one video editing solution that helps you create incredible videos. With the mission of making content creation easier and more engaging, we were first launched on mobile platforms in April 2020. In less than a year, we were released in Brazil, US, Indonesia, Japan and several other countries. To better serve the dive
Job Source: HireIO Inc
OKX

Senior Site Reliability Engineer

San Jose, CA, United States
- Ending Soon
Who We Are OKX is revolutionising world systems through our cutting-edge digital asset exchange, Web3 portal and blockchain ecosystems.We are deeply committed to shaping a fairer, more transparent and accessible society through blockchain technology and to date, we have 50+ million users, 3000+ employees and 180+ countries believing in the same vi
Job Source: OKX
Zscaler

Senior Site Reliability Engineer

San Jose, CA, United States
About Zscaler Zscaler (NASDAQ: ZS) accelerates digital transformation so that customers can be more agile, efficient, resilient, and secure. The Zscaler Zero Trust Exchange is the company’s cloud-native platform that protects thousands of customers from cyberattacks and data loss by securely connecting users, devices, and applications in any locat
Job Source: Zscaler
Sentry

Senior Site Reliability Engineer

Burlingame, CA, United States
- Ending Soon
Bad software is everywhere, and we’re tired of it. Sentry is on a mission to help developers write better software faster, so we can get back to enjoying technology. With more than $217 million in funding and 90,000 organizations that believe we’re on to something, we're building performance and error monitoring tools that help companies like Disne
Job Source: Sentry
Grindr

Senior Site Reliability Engineer

Palo Alto, CA, United States
This is a hybrid role based in our Chicago, Palo Alto or San Francisco office and will require you to be in office Tuesdays and Thursdays. What’s so interesting about this role? As we enter our second year as a public company, Grindr is building on the success we’ve had over our 15-year history in connecting, supporting, and improving the lives o
Job Source: Grindr

Senior Site Reliability Engineer

Palo Alto, CA, United States

POSITION SUMMARY:

Velocity Global seeks a Senior Site Reliability Engineer (SRE) with extensive observability experience. In this role, you will help to lead the automation and support efforts of our cloud Infrastructure, identify strategies to improve our full-stack telemetry and monitoring capabilities, and mentor other SREs who contribute to observability-related work.

SREs work cross-functionally with DevOps and Engineering teams, combining operations work with software engineering principles to enable high availability of production systems. You will serve as a partner to our Engineering organization to help make their services more performant, scalable, observable, and reliable. Every engineering team at Velocity Global should be responsible for the software they build. SREs are critical in providing the tools, practices, and expertise to make that happen.

We are growing and evolving the SRE team to help meet Velocity Global’s product-first reliability goals for 2023 and beyond.

Responsibilities include

Automating observability and alerting across an ever-changing landscape of microservices

Automated Service Reliability Scorecards and Production Readiness Standards

Chaos Engineering and Game Day Simulations to discover and test fixes for weak spots that would otherwise not be identified until a real-life production incident occurred

Software engineering project work, proposed and driven by individual SRE team members, to remove operational bottlenecks and increase velocity in ways we've never considered before

Expand and improve our observability and monitoring footprint

Collaborate with the Engineering and DevOps to create architectural plans, define project requirements, and establish technical standards

Review the work of other team members, help them get unblocked, and provide mentoring

Improve common operational challenges by building tools and automating scripts

Serve as the on-call incident commander to help debug and drive resolution of production reliability issues, contribute to the postmortem, and work to prevent recurrence

Participate in design and production reviews for new features, products, or infrastructure

Audit and tune the configuration of systems owned by other engineering teams

Plan for the growth of Velocity Global’s infrastructure and infrastructure reliability/resiliency

Designing and implementing High Availability architecture underlying Velocity Global’s platform

Creating Disaster Recovery solutions, including backups, redundant systems, and emergency response processes

This individual will report to the Manager, Site Reliability Engineering

The team this role is a part of is primarily based out of the United States.

Qualifications/Skills

SREs combine some level of experience in both software engineering and operations. They may hail from various backgrounds and job titles, including production or application engineers, software developers with a strong DevOps mindset, SysAdmins with solid systems and programming skills, and Cloud Infrastructure or DevOps engineers. We are looking for someone with the following experience:

5+ years working in a relevant role, including 2+ years of technical leadership experience mentoring more junior engineers

3+ years of experience architecting and administrating observability stacks, either managed or self-hosted (e.g., DataDog, New Relic, Prometheus, Elastic Stack/ELK, AppDynamics)

Solid experience and understanding of AWS cloud services

Operation of containerized microservices running on public cloud, asynchronous event processing, and databases

Strong understanding of Linux, GitLab, and CI/CD pipelines

On-call support of highly available production systems

Design and build new tools to automate repetitive tasks, prevent incidents, or improve TTR using an object-oriented programming language such as Python

Infrastructure as Code using tools like Terraform, Terragrunt, or Cloud Formation

Understand how application components interact and contribute to architectural discussions

Unwavering commitment to operational security and best practices

Identify problems but also propose solutions, then go out and implement them--from submitting a merge request on another team's repository to scoping out a new reliability project

Motivated to help other teams improve their service reliability through reviews, pair programming, hands-on training, and continuous improvement of tooling and services

In the spirit of winning together, the position will be based in Palo Alto and in-office collaboration is required for at least one day per week.

Our job titles may span more than one career level. The base pay depends upon many factors, such as training, transferable skills, work experience, business needs, and market demands. The base pay range is subject to change and may be modified. This role is eligible for annual performance-based bonuses, flexible time off, health care benefits, retirement savings, and employee incentive plans.

Pay Range

$140,300—$172,000 USD

GO FARTHER WITH VELOCITY

At Velocity Global, we’re building a dream team made up of the world’s best talent. We’re looking for people like you to join us as we make opportunity borderless for people everywhere.

About Velocity Global

At Velocity Global, our values represent who we are and the company we want to be. We harness the power of unity, diversity, and collaboration, drive for impact, and win as a team - bringing our unique talents together to achieve our common goals. In partnership with our customers and ourselves, we are better together, and together, we win.

Please refer to our present benefits offering here.

#J-18808-Ljbffr

Name	Expiration	Description
ATTBCookie*	2 years	These cookies are used to remember a user’s choice about cookies on thebigjobsite.com. Where users have previously indicated a preference, that user’s preference will be stored in these cookies.
last-search search redirect-stage original-keyword	1 day Session 1 hour 1 hour	These cookies are used by thebigjobsite.com to pass search data between our own pages.
datadome	1 year	DataDome is a cybersecurity solution to detect bot activity
jjap	1 days	Used to track if you have seen the Job Alerts prompt. Job Alerts is a service you can subscribe to to receive information about new jobs.

What job

...and where?

Similar Jobs