Hadoop Administrator II
San Antonio, TX, United States
The Hadoop Administrator is responsible for the care, maintenance, administration, and reliability of the Hadoop ecosystem. This role ensures system security, stability, reliability, capacity planning, recoverability, and performance, meeting the growing and evolving data demands of the enterprise.
The Hadoop Administrator is responsible for:
Supporting and maintaining the Hadoop ecosystem including HDFS, Yarn, Hive, LLAP, Druid, Impala, Spark, Kafka, HBase, and Cloudera Work Bench. Managing storage, performance tuning, and volume management of Hadoop clusters and MapReduce routines. Deploying Hadoop clusters, adding/removing nodes, tracking jobs, monitoring cluster components, configuring name-node high availability, scheduling, configuring, and performing backups. Installing and configuring software, applying patches, and upgrading as necessary. Capacity planning and implementing new/upgraded hardware and software for storage infrastructure. Designing, capacity arrangement, cluster setup, performance tuning, monitoring, structure planning, scaling, and administration. Communicating with development, administration, and business teams including infrastructure, application, network, database, and business intelligence teams. Designing and developing Data Lake and Data Warehousing. Collaborating with technical and non-technical resources on project work, POCs, and troubleshooting exercises. Configuring Hadoop security, specifically Kerberos integration. Creating and maintaining job and task scheduling and administration. Managing data movement in and out of Hadoop clusters using Sqoop and/or Flume. Reviewing Hadoop environments for compliance with industry best practices and regulatory requirements. Data modeling, designing, and implementation based on recognized standards. Acting as a key contact for vendor escalation. Participating in an on-call rotation to support a 24/7 environment and working outside business hours as needed.
Minimum Qualifications
Bachelor's degree in Information Systems, Engineering, Computer Science, or related field from an accredited university. Intermediate experience in a Hadoop production environment. Must have intermediate experience and expert knowledge in at least 4 of the following areas: Hadoop administration in Linux and virtual environments. Installing and managing Hadoop distributions (Cloudera). Hadoop ecosystem components including HDFS, Yarn, Hive, LLAP, Druid, Impala, Spark, Kafka, HBase, Cloudera Work Bench. Overall Hadoop architecture. Using and troubleshooting Open Source technologies, including configuration management and deployment. Data Lake and Data Warehousing design and development. Reviewing existing DB and Hadoop infrastructure and identifying areas for improvement. Implementing software lifecycle methodology to ensure supported release and roadmap adherence. Configuring high availability of name-nodes. Scheduling and taking backups for the Hadoop ecosystem. Data movement in and out of Hadoop clusters. Scripting in a Linux environment. Project management concepts and tools (MS Project). Working effectively with application and infrastructure teams. Strong organizational skills, task management, and teamwork abilities. Valid Class C Texas Driver's License. Ability to make independent recommendations. Proficient in Microsoft Office, including word processing, spreadsheets, presentations, email, and scheduling.
If you have the required qualifications and are looking to contribute to a dynamic team, we encourage you to apply for this position.