While Imaging Spectroscopy data is useful, there are significant barriers to use. Specifically, imaging spectroscopy data has high dimensionality and big data volumes. Using these data can be computationally prohibitive and exclusive to those with specialized skills for working with these data.
The primary objective of ImgSPEC is to demonstrate a proof-of-concept, end-to-end, on-demand, processing platform for imaging spectroscopy data. The motivation was to democratize access for providing information of value from NASA’s open, free imaging spectroscopy data.
The ImgSPEC platform enables users to load and access different kinds of data in a user-preferred work environment (e.g., Jupyter notebook, RStudio) that facilitates easy code and data sharing with automated scalable hybrid cloud computing. Using drop down menus, users can open metadata repositories (e.g., NASA Common Metadata Repository) with streamlined APIs. Interfaces have been created for pulling Github/GitLab/Bitbucket repositories to create containers that package code into executable applications (Merchant et al. 2016, Devisetty et al. 2016)
For code that cannot be packaged users can fork repositories and work as they would in their local environment, but with scalable backend compute. User experience was central to the design of ImgSPEC.
Because of the flexibility of the analysis platform, users can generate data products from well-established workflows on demand. They can also customize those workflows and develop new algorithms for merging data products with others to conduct novel analytics.
Value of ImgSPEC
ImgSPEC specifically aims to:
- Provide scalable work environments that do not require a heavy lift from the science user
- Reduce download times; and
- Enable easy provenance for reproducibility
While ImgSPEC focuses on using imaging spectroscopy data, it’s platform is data type agnostic and builds on capabilities developed for active sensors on the Multi-mission Algorithm and Analysis Platform (MAAP). MAAP has demonstrated a speed up in processing that enables processing big data over whole continents in less time than for processing a single scene on a local machine.
