Infrastructure Engineer, Observability
San Francisco, CA, United States
WHY WE'RE LOOKING FOR YOU:
Retool started as a way to address obstacles with internal tools and has grown into a company that solves internal tooling for thousands of companies, from one-person startups to S&P 500 enterprises. We’ve done a lot with a little–we have a rapidly growing engineering team and a laundry list of features and foundational infrastructure pieces we want to tackle.
Retool is in an exciting hyper-growth phase and we need infrastructure engineers to tackle our rapid scaling challenges. These scaling challenges are unique both in scope and in technical complexity as we scale both the company and the product.
WHAT YOU'LL DO:
In this role, you will be a founding member of our Observability team! You will build, integrate, and evangelize observability platforms and solutions for our products and internal systems. You will drive adoption of these solutions and ensure they drive value for the company. In delivery of these solutions, you will leverage automation and orchestration tooling, along with infrastructure-as-code patterns.
Your core responsibility in this role is to build and deploy observability solutions that make our products highly available, scalable, reliable, observable and delight our customers.
IN THIS ROLE, YOU'LL:
Help build a great product that improves productivity of engineers across the globe by several orders of magnitude
Design and build observability solutions via collection frameworks, delivery, analysis, and visualization of metrics, logs, and traces
Work with engineers, designers, product managers and customer support to instrument and implement observability into our products and internal apps
Building orchestration and automation tooling around off-the-shelf solutions (e.g. Datadog), as well as building custom solutions that meet our unique needs
Be involved in the development of scalable, distributed software systems that support globally distributed customer base
Coach and mentor other SRE/SWE; Provide leadership in iteratively defining & refining development processes as the team grows
THE SKILLSET YOU'll BRING:
7+ years of related professional experience, with 2+ years in a lead role for a mission critical platform with high-availability requirements
Experience with containerization (e.g. Docker, Kubernetes), infrastructure as code (e.g. Terraform) and observability (e.g. Datadog, Stackdriver, Wavefront, Grafana) stacks
A strong understanding of system availability, resiliency, and recoverability
Comfortable being a hands-on individual contributor, while at the same time hiring and scaling the team
Strong organizational skills with high attention-to-detail and able to work independently with minimal supervision
Ability to thrive in a high-energy, high-growth, fast-paced, entrepreneurial environment. Willing to learn new skills and implement new technologies
BONUS POINTS:
Familiarity with GitHub, CI/CD, DevOps
Familiarity with React, React Native frontend web and mobile application development
Experience with observability platforms and tools like Datadog, New Relic, Dynatrace etc..
#J-18808-Ljbffr