Create Email Alert

Email Alert for

ⓘ There was an unexpected error processing your request.

Please refresh the page and try again.

If the problem persists, please contact us with your issue.

Email address is already registered

You can always manage your preferences and update your interests to ensure you receive the most relevant opportunities.

Would you like to [visit your alert settings] now?

Success! You're now signed up for Job Alerts

Get ready to discover your next great opportunity.

Similar Jobs

  • Apple

    Sr Cloud Site Reliability Engineer, IS&T Ai & Data Platforms

    Sunnyvale, CA, United States

    Sr Cloud Site Reliability Engineer, IS&T Ai & Data Platforms Sunnyvale,California,United States Software and Services Apples Applied Machine Learning team has built systems for a number of large-scale data science applications. We work on many high-impact projects that serve various Apple lines of business. We use the latest in open source technolo

    Job Source: Apple
  • TikTok

    Site Reliability Engineer, Cloud Native Platform

    San Jose, CA, United States

    • Ending Soon

    Responsibilities TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo. Why Join Us At TikTok, our people are humble, intelligent, compassionate and creative. We

    Job Source: TikTok
  • Palo Alto Networks

    Sr Site Reliability Engineer (NetSec, Cloud)

    Santa Clara, CA, United States

    • Ending Soon

    Company Description Our Mission At Palo Alto Networks everything starts and ends with our mission: Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and more secure than the one before. We are a company built on the foundation of challenging and disrupting the way things

    Job Source: Palo Alto Networks
  • Centrify Corporation

    Cloud Site Reliability Engineer

    Santa Clara, CA, United States

    • Ending Soon

    Our software runs on public clouds with 99.9% or better uptime and is mission critical for our customers. Our cloud operations team is where the rubber meets the road and needs innovative Site Reliability Engineers. Join a professional team of smart and hard-working professionals building enterprise-class cloud-based services in the rapidly growing

    Job Source: Centrify Corporation
  • CrowdStrike

    Sr. Software Engineer - Cloud Platform Reliability (Remote)

    Sunnyvale, CA, United States

    #WeAreCrowdStrike and our mission is to stop breaches. As a global leader in cybersecurity, our team changed the game. Since our inception, our market leading cloud-native platform has offered unparalleled protection against the most sophisticated cyberattacks. We work on large scale distributed systems, processing over 1 trillion events a day with

    Job Source: CrowdStrike
  • Hireio, Inc.

    Sr Site Reliability Engineer

    San Jose, CA, United States

    • Ending Soon

    Introduction We are an all-in-one video editing solution that helps you create incredible videos. With the mission of making content creation easier and more engaging, we were first launched on mobile platforms in April 2020. In less than a year, we were released in Brazil, US, Indonesia, Japan and several other countries. To better serve the dive

    Job Source: Hireio, Inc.
  • Palo Alto Networks, Inc.

    Sr Site Reliability Engineer (Cortex XDR Cloud)

    Santa Clara, CA, United States

    • Ending Soon

    Our Mission At Palo Alto Networks everything starts and ends with our mission: Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and more secure than the one before. We are a company built on the foundation of challenging and disrupting the way things are done, and we’re loo

    Job Source: Palo Alto Networks, Inc.
  • TikTok

    Site Reliability Engineer, Compute Platform

    San Jose, CA, United States

    • Ending Soon

    Responsibilities TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo. Creation is the core of TikTok's purpose. Our platform is built to help imaginations thriv

    Job Source: TikTok

Sr Cloud Site Reliability Engineer, IS&T Ai & Data Platforms

Sunnyvale, CA, United States

Summary

Posted: May 17, 2024

Role Number: 200515360

Apple's Applied Machine Learning team has built systems for a number of large-scale data science applications. We work on many high-impact projects that serve various Apple lines of business. We use the latest in open source technology and as committers on some of these projects, our team looks to push the envelope! Working with multiple lines of business, we handle many streams of Apple-scale data. We bring it all together and unleash business value. We do all this with an outstanding group of software engineers, data scientists, SRE/MLOps engineers and managers. We are looking for a talented and dedicated engineers to join our team to bring passion for infrastructure and distributed systems, to build world-class platforms/products at a very large scale across cloud environments.

Description

Join Apple's Applied Machine Learning Team, as a Senior Software Engineer, to build & support innovative software applications. Candidates should have strong background in setting up and supporting the infrastructure for large scale big data applications in public cloud like AWS. RESPONSIBILITIES: - Focus on automation and providing insight for the Infrastructure service reliability and availability through extensible services & platforms. - Design, implement and maintain software & tools for large-scale distributed systems especially Big Data stack of technologies like Iceberg, S3, HDFS, Hive, Ranger. - Experience in operating and deploying container orchestration systems like Kubernetes &/ YARN. - Utilize core computer science data structures, algorithms, and software tools in one of the languages - Python, Golang, Java or other JVM languages. - Experience in managing data pipelines using Kafka, Flink, Spark, Airflow & Jupyter. - Work with platform tools and automation systems including deployment automation practices especially across multi-AZ or DC infrastructure using CM tools like Saltstack, Ansible, Terraform, etc. - Plan, design & implement business continuity, capacity management & observability across all services & levels of the stack. - Build & Support CI/CD tools to port & manage applications on AWS & Kubernetes - Build automation to enable self-healing systems. - Trace SLIs for meeting the agreed upon SLAs. - Ensure compliance with appropriate security standards. - Deploy and debug systems built for horizontally scalable multi-tenant deployments. - Solve and find workarounds for issues in customer-impacting, production systems. - The candidate is expected to be self-motivated, proactive, and a solution-oriented individual.

Key Qualifications

8+ years of experience in SRE/MLOps.

Experience operating and maintaining production systems in linux and public cloud infrastructure providers like AWS (EC2, EBS, S3, ElasticIP, Route 53, IAM).

Experience in cloud native orchestration systems like Kubernetes & enabling AutoScaling for both VM & Containerized workloads.

Strong proficiency with Helm and Kustomize for managing Kubernetes applications and configurations.

Possess good working knowledge of load balancers, firewalls, TCP/IP networking architecture and core technologies (http, dns, routing, etc).

Usage of configuration management tools: Ansible/Puppet/Chef/Saltstack.

Experience in GitOps or CICD tools: Spinnaker/Jenkins/Flux/ArgoCD.

Strong programming skills in Unix & Python/Java.

Experience with capacity planning, utilization reviews and performance tunings.

Should have critical thinking, good debugging and problem solving skills.

Experience in implementing, managing and refining business continuity solutions.

Education & Experience

BS in computer science with 7-10 years or MS plus 5-7 years experience or related experience.

Additional Requirements

- Work closely with multiple cross functional teams to effectively co-ordinate and manage business user expectations.

- Leadership, critical thinking and excellent verbal and written communication skills

- Working on creating new utilities for operational efficiency.

Pay & Benefits

At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $170,700 and $300,200, and your base pay will depend on your skills, qualifications, experience, and location.

Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses - including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits.

Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.

More

Apple is an equal opportunity employer that is committed to inclusion and diversity. We take affirmative action to ensure equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.

#J-18808-Ljbffr

Apply

Create Email Alert

Create Email Alert

Email Alert for Sr Cloud Site Reliability Engineer, IS&T Ai & Data Platforms jobs in Sunnyvale, CA, United States

ⓘ There was an unexpected error processing your request.

Please refresh the page and try again.

If the problem persists, please contact us with your issue.

Email address is already registered

You can always manage your preferences and update your interests to ensure you receive the most relevant opportunities.

Would you like to [visit your alert settings] now?

Success! You're now signed up for Job Alerts

Get ready to discover your next great opportunity.