Senior Systems Administrator - Windows and Linux
New York, NY, United States
Roles & Responsibilities:
The Senior Systems Administrator/Engineer, as a member of the Scientific Computing and Data group, is responsible for a computational and data science ecosystem for researchers at Mount Sinai. The Administrator is the principal technology expert for Windows and Linux systems in the Scientific Computing group. The incumbent utilizes a thorough understanding of available technology, tools and best practices to design, manage, maintain, upgrade and monitor Scientific Com puting's systems. The incumbent will develop and implement solutions responsive to researcher needs, in conjunction with other technology professionals and consistent with IT policies and Compliance. The systems will support a wide array of applications, including VMware, REDCap, Jira, Postgres, MySQL, SQL server, Tivoli Storage Manager (TSM), and other custom Sinai-developed software. In total, there are > 100 servers including physical servers and VMs along with an archival storage system containing over 20 petabytes of data. The TSM sys tem is integrated with the 25,000-core, 30 petabyte high-performance computing system. This position reports to the Director for Computational & Data Ecosystem in Scientific Computing. Specific responsibilities are listed below.
Primary duties include:
Design, develop, implement all system administration tasks, including hardware and software configuration, configuration management, system monitoring, upgrade, usage monitoring and reporting, system performance, security, networking and metrics, etc. The infrastructure includes both Windows and Linux system with file servers in multiple physical locations
Design and develop scripts for system administration and monitoring for Ansible configuration management, Grafana/Nagios/Zabbix system monitoring, Splunk and other tools.
Research, deploy and manage security infrastructure, including implementation of policies and procedures from IT Security and Compliance.
Plan, implement, troubleshoot and maintain software including databases (SQL, MySQL, PostgreSQL, triple store and other databases) and REDCap, Jira, TSM and other software.
Troubleshoot system and application issues across multiple environments and operating platforms. Provide off-hours support for critical and other production issues.
Research, suggest and implement new uses of information technologies, policies and procedures for continued improvement.
Develop processes and policies for a 20-petabyte TSM tape archival storage system with thousands of users. Perform system administration support for TSM, including management of the 100 terabyte TSM disk cache, 12 LTO9 tape drives and 12 LTO5 tape drives. Assist with end researcher support to place and retrieve files.
ssist in the management and maintenance of high-performance computing (HPC) cluster and data center work.
nswers and resolves user tickets.
Develops, creates effective system documentation.
Performs other duties as assigned or requested.
Requirements: Bachelor of Science degree in Computer Science or Engineering or a related discipline.
Ten years of experience installing, configuring, managing, provisioning, automating tasks and monitoring hardware and software. Experience with configuration management and security best practices.
t least six years of experience in designing, administering and troubleshooting Linux and Windows systems, file systems and VMs.
The ability to communicate effectively and manage multiple conflicting priorities simultaneously.
Requires excellent analytical ability, strong judgment and management skills, and the ability to work effectively and indepen dently with client and IT management and staff.
Experience working in a research environment preferred.
Experience with JIRA, Confluence administration, databases (MS SQL, MySQL, MySQL Galera, Oracle, PostgreSQL, etc.) and VMWare preferred.
bility to manage multiple priorities, commitments and projects.
bility to lead the project to successful completion with little guidance
Experience with supporting HPC environments including networking, storage and job scheduler are preferred.