Site Reliability Engineer (Resilience Engineer/Senior Resilience Engineer)
Austin, TX, United States
We’re excited you’re considering joining a great place to work!
Texas Mutual is deeply committed to creating and maintaining an environment of mutual respect and is proud to be an equal opportunity employer. All qualified applicants are encouraged to apply and will receive consideration for employment without regard to age, race, color, national origin, religion, sex, gender identity, sexual orientation, genetic information, veteran status, or any other basis protected by local, state, or federal law.
About This Position
The team you’ll join is part of a unique organization supporting a 1000-person, billion-dollar company. At Texas Mutual you’ll be one of the leaders on a newly formed Resilience Engineering team. This team will improve our incident response and change enablement processes, identify areas for self-service operations and provide logging/monitoring/alerting capabilities. You will collaborate regularly with Infrastructure, Operations, Development and QA teams to foster resiliency, grow a culture of Site Reliability Engineering and support our transition from on-premise to cloud infrastructure.
Employees of Texas Mutual frequently claim we feel more like a family than a corporation as we all show up to improve the lives of Texans. Texas Mutual's focus on preventing injuries, fighting fraud and taking care of injured workers provides job satisfaction that you will find hard to beat.
Responsibilities & Qualifications
This position is an opportunity for you to be a member of the Resilience Engineering (SRE) team at Texas Mutual. We are a rapidly maturing IT department with a DevOps culture, transitioning into a highly automated cloud environment (CI/CD, configuration management, infrastructure as code). You will collaborate with and enable all facets of IT to improve our resilience in response to change.
We understand that it’s not reasonable to expect 100% uptime. You will ensure sure we’re prepared to deal with the unknown, learn from incidents, and to continue our success in supporting Texas workers.
Who You Are
You are passionate about problem-solving and efficiency
You are eager to learn and share your wisdom with others
You are excited for new challenges and new skills
You are creative, willing to experiment and share your ideas
You are curious about people and how we can more effectively work together
You are a good listener and seek to understand other people’s perspectives
What You’ll Do
You will engineer solutions to improve efficiency, enable self-service and support automated incident response processes
You will serve as an Incident Commander to manage communication, resolution and analysis of production incidents
You will lead post-incident retrospective meetings and write compelling narratives about our IT experiences to encourage organizational learning
You will implement and administer monitoring and alerting tooling to enable proactive incident response processes
You will build and configure integrations between systems for monitoring, alerting and reporting system health
You will track, report and effectively communicate system availability and performance metrics
You will facilitate the creation of operational runbooks and document common recovery actions
You will work with teams to define Service Level Objectives (SLOs) and define processes in response to SLO breaches
Collaborate with software developers and architects to identify improvements that will increase the robustness and reliability of our systems
Required Qualifications
Bachelor’s degree in related field or equivalent education, training, or experience
At least 2 years of experience is required for the Resilience Engineer level; at least 4 years of experience is required for the Senior Resilience Engineer level or equivalent education, training, or experience
Experience building, maintaining or supporting complex technical systems.
Bonus Points For
Experience with ITIL
Strong operations and engineering background
Experience using Git in a team environment
Experience with object-oriented programming languages (Java, C#) and/or scripting (PowerShell, Python, Bash)
Experience with automation tooling (Terraform, Ansible)
Experience implementing monitoring, logging or alerting
Excellent negotiation, collaboration and presentation skills
Excellent communication skills
Passion for creative writing and/or story telling
Flex-Hybrid Work Environment
Texas Mutual’s flex-hybrid schedule allows you to bring your best self to work by either working remotely or collaborating in the office based on business needs. All Texas Mutual employees are required to have Texas residency and travel to their designated office as needed.
Our Benefits
Flex-hybrid work environment for most positions
Annual performance bonus and merit-based pay increase
Professional development and tuition reimbursement
Automatic 4% employer contribution to retirement plan
401k plan with 100% employer match up to 6%
Three weeks’ time off for vacation
Nine paid holidays and two personal days each year
Generous sick, holiday and volunteer time off
Day one health, Rx, vision and dental insurance
Life and disability insurance
Flexible spending account
Pet coverage and pet Rx discounts
Free on-site gym, fitness classes, and health and wellness resources
Free identity theft protection
Free 2nd medical opinion service
Free student loan repayment and refinancing consultation
Employee referral bonus
#J-18808-Ljbffr