Site Reliability Engineer
San Francisco, CA, United States
Role-Site Reliability Engineer(must have 12 years of Experience)
Location: Temporarily Remote; Preferred San Francisco /LA / Seattle, WA others outside the area must be willing to relocate
Must be Comfortable with Hacker Rank Test
Key Responsibilities:
Kubernetes and Cluster operations and maintenance.
Ensure the reliability, availability, and performance of services through stability and automation product development, emergency response and system resilience improvements
Manage services, responsible for operational support, 24X7 troubleshooting, automation
Troubleshoot and diagnose issues, propose, and implement solutions to reduce frequency of occurrence
Meet service-level-agreements (SLAs) or service-level-objective (SLOs) by measuring and monitoring service availability, performance, and overall system health.
Perform various SRE operations including scale up/down, build and maintain clusters
Available for on-call rotation for production impacting incidents or key customer events
Core Experience:
5+ years of experience in the following areas:
Linux Systems Knowledge. e.g. file-systems, memory management, process management, basic networking skills.
Linux Troubleshooting. Debug Linux systems. e.g. file-system level, systems performance issues troubleshooting etc.
Experience in Python programming GoLang, and Shell scripting. Should be able to code simple programs comfortably.
Kubernetes Operational Experience
Basic knowledge of Kafka. How it works and some experience with it.
Strong technical operations, devops and infrastructure support with excellent Linux troubleshooting skills to resolve application issue.
Minimum qualifications:
Bachelor's degree or above, majoring in Computer Science or related fields
Must be responsible, interpersonal self-starters, comfortable with ambiguity, excellent communicators, and problem solvers
Must have the ability to work in a fast-paced environment without constant supervision
Motivated learner without requiring constant supervision.
Must have good troubleshooting skills
#J-18808-Ljbffr