Enabling Reproducible Geospatial Research on CyberGISX
Topics: Cyberinfrastructure
, Geographic Information Science and Systems
,
Keywords: Reproducibility, CyberGIS, CyberGISX, Geospatial Software, JupyterHub
Session Type: Virtual Paper Abstract
Day: Sunday
Session Start / End Time: 2/27/2022 05:20 PM (Eastern Time (US & Canada)) - 2/27/2022 06:40 PM (Eastern Time (US & Canada))
Room: Virtual 63
Authors:
Anand Padmanabhan, University of Illinois at Urbana Champaign
Alexander Michels, University of Illinois at Urbana Champaign
Zhiyu Li, University of Illinois at Urbana Champaign
Shaowen Wang, University of Illinois at Urbana Champaign
,
,
,
,
,
,
Abstract
JupyterHub offering an easy-to-use interface with no frontend development work while promoting reproducible science, is popular in many communities. In the geospatial science community, CyberGISX provides such a JupyterHub environment with many cyberGIS (i.e., geospatial information science and systems based on advanced cyberinfrastructure) and geospatial software packages prebuilt and ready to use. CyberGISX must however balance a trade-off between providing a static compute environment which enhances reproducibility and continuously updating the software environment to keep up with advances in scientific software.
To enhance reproducibility with minimal effort from end-users, we have designed and implemented a solution on CyberGISX that allows software to be kept on an external file server mounted into each user's environment. Scientific software is installed with Easybuild and managed by Lmod giving a variety of benefits: (1) the compute environment is more standardized and easily reproducible outside of the gateway; (2) multiple versions of software can be made available to users without increasing container size; and (3) the exact copies of software are always available on the gateway instead of being rebuilt for every release, further enhancing reproducibility. We also employ an Easybuild-installed Anaconda to create and manage conda environments on the file server. The combination of the software stack from Easybuild and Python environment from conda provides end-users with kernels for their Jupyter notebooks which are persistent and unchanged as the gateway's container updates. This design enhances reproducibility and adds functionality for advanced users without introducing technical barriers to non-technical end-users.
Enabling Reproducible Geospatial Research on CyberGISX
Category
Virtual Paper Abstract
Description
This abstract is part of a session. Click here to view the session.
| Slides