Software Engineer, Machine Learning/Data Engineering
San Francisco, CA, United States
The Company
Our client, Rhizome, is at the forefront of developing decision intelligence technology at the intersection of climate science and infrastructure systems. The team pursues this endeavor with the wisdom and steadiness of industry veterans, and the curiosity, grit, and energy of startup and technology enthusiasts.
They are seeking a Software Engineer who can scale our Data Engineering capacity and contribute to Machine Learning development at the enterprise scale. The ideal candidate will have a strong background in data processing pipelines, DAGs, ETLs, feature extraction, and statistical analytics using Python and AWS cloud. The ideal candidate will have deep expertise in working with GIS data, relational databases, CSVs, and Excel at the enterprise scale. Successful candidates will also have practical experience building large scale ETL pipelines on AWS or GCP for data engineering, feature extraction, statistical analysis, and correlations.
Their climate resilience SaaS platform helps utilities, governments, and industries plan for greater resilience to climate change and extreme weather by applying AI to a vast amount of information that characterizes infrastructure assets and their vulnerability to extreme weather. Focused on the $500B resilience investment gap in the grid today, our mandate is simple: Help electric utilities proactively adapt to climate change by integrating cutting-edge climate-asset intelligence into their existing planning workflows. As the world experiences record-breaking climate-related impacts, especially related to grid failures, our platform identifies future extreme weather vulnerabilities on utility assets at high resolutions and empowers planners to optimize investment deployments that keep society safe during natural hazard events.
Your Impact (Responsibilities)
Design, construct, and maintain data pipelines to combine large volumes of geospatial, climate + weather, and electric utility datasets.
Work with a cross-functional team to deliver data in support of analytic and ML pipelines.
Develop deep familiarity with electric utility datasets and take ownership of integration of new datasets into our existing environments.
Contribute to ML model development in the context of understanding future extreme weather impacts on the power grid.
Optimize storage and ETL pipelines.
Develop versioned, scalable, repeatable and reliable pipelines for utility data that is in GIS and Tabular format to Delta Lake format.
Scale & Automate data pipelines for statistical analysis for internal and external use-cases.
Standardize and scale multi-tenant data storage.
Exceptional ability to diagnose data issues and discrepancies.
Ability to modularize different stages of data ingestion and verification.
Ability to write algorithms for data sanity checks and classification of different data elements.
Ability to develop heuristics and suggestions for missing data items.
Ability to validate and test pipelines and write functional test to validate the pipelines.
About You
Exceptional Python programming skills.
Exceptional programming skills with NumPy, SciPy, Xarrays.
Exceptional programming skills with frameworks like Dagster or Airflow or Prefect.
Exceptional programming skills with Databricks or Apache Spark or Amazon EMR or Cloudera.
Deep expertise in storage optimization and partitioning on RDS, Postgres, PostGIS, Delta Lake.
Hands on with GIS dataset and QGIS or ESRI.
Hands of Experience with multi-dimensional Climate or Weather data.
Familiarity or hands on experience with Secure Cloud Development.
Pluses
Exposure or experience with electric utility tech stacks (AMS, OMS, GIS, etc.).
Exposure to applied ML and Data Engineering in the context of electric utilities
#J-18808-Ljbffr