Site Reliability Engineer
Phoenix, Arizona
Job Description
SRE (Site Reliability Engineer)
Location: Phoenix, AZ (onsite)
Skill Sets:
DevOps work experience (Docker/Kubernetes, Splunk/Dynatrace, Ansible, GITHUB)
Experience in handling Java Full Stack applications deployment
Experience in generating Test dashboard reports for the leadership review
Work experience in handling monitoring & alert systems
Detail Job Description:
System Design and Integration:Collaborate with dependent teams to understand the E2E architecture and identify the integration services, end point and Point of Contacts
Define and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs)
Identify the right tools of communications with the defined SLA’s
Monitoring and Incident Response:Set up and maintain monitoring and alerting systems to detect issues proactively.
Respond to incidents, conduct root cause analysis, and implement corrective actions.
Capturing historical outage data to track patterns & trends
Automation and Tooling:Automate repetitive tasks to reduce manual intervention and human error.
Develop and maintain tools to improve monitoring, and incident response.
Collaboration and Communication:Work closely with development, QA, and operations teams to integrate reliability into the development process.
Communicate effectively with stakeholders about system health and incident status.