Remote Sensing Data at Home in the Cloud
When recently reflecting on the current state of remote sensing (RS) and its relationship to cloud computing, I remembered the phrase, “The network is the computer,” first coined by Sun Microsystems employee John Gage in the 1980s. This is increasingly true for many aspects of modern technology, where mobile communications, pervasive broadband connectivity in developed nations, the dawn of global satellite internet connectivity, and personal and business data and processes are increasingly stored and analyzed online.
People often ask each other, “Where is your data stored?” whether it’s photos, music, that book you just read or the report you’re overdue writing for work. The answer is usually “the cloud,” which means it is everywhere, somewhere and nowhere. You can access the data wherever you have connectivity, it is stored in a data center (or several data centers) and tucked away who-knows-where, and it is in a conventional sense nowhere, since it is not physically stored at home or the office.
So, what does all this mean for RS? The acquisition of information from a distance has always required massive amounts of data storage, fast and reliable network connectivity to move the data, and the computation needed to manipulate large rasters and the subsequent arrays or matrices used to manipulate them. There are several reasons why the cloud era is especially relevant to RS, including:
- The old adage of “move the computation to where the data is” has never been truer or timelier. It is increasingly infeasible, or at least inefficient, to store some RS datasets in more than one location. And for some projects, it is also infeasible from a bandwidth perspective to move data to a local office or machine. An example of this is Google Earth Engine (GEE), which allows users to develop algorithms using imagery stored as assets in Google infrastructure. These assets can be shared with other users without copying the data. Imagery from local storage, a Google Cloud Storage (GCS) bucket or another web location can be loaded into assets. The advent of Cloud Optimized GeoTIFFs (COGs) also mean that for data that is still stored remotely, an efficient mechanism enables only the required subset of an image to be retrieved.
It amazes me that scientific research and development that uses petabytes of data and massive amounts of computing power can be conducted using only a web browser. We have accomplished this at Woolpert when using Google Colab in conjunction with the GEE Python API. The network truly is the computer so, with internet connectivity, your data, computation and project are available anywhere.
- New tools and capabilities change the way science is performed, whether through more powerful microscopes, a modeling molecular bar behavior on a computer or having access to a particle accelerator. Likewise, cloud resources enable planetary-scale scientific problems to be pursued and, in some cases, to create new scientific methodologies. Issues such as climate change, natural resource management, urban planning and agriculture span the Earth and require algorithms combined with data at the appropriate geographic scales.
- Although programs such as Landsat set the precedent decades ago using multi-petabyte level storage, the increasing availability of data created from smallsat constellations will continue to require large amounts of storage. U.S.-based satellite operators are required by law to store every image that is downlinked to Earth. These smallsat constellations will also provide a diverse range of RS technologies—electro-optical imagery, synthetic aperture radar, radio frequency signals, hyperspectral, etc.—and a key differentiator will be using these datasets together. As we’ve seen from the many datasets hosted by Google and made available in GEE, scientists and researchers will benefit from COG-enabled and cloud-shared data availability, whether public or private.
- So often RS data is a shared resource and a shared experience. Sharing can occur by default, like with public data, or by choice, as with commercial data made available to colleagues or partners. Both are good. A key tenet of science is reproducibility, and this extends beyond data availability to include the method used and presentation of findings.
The use of a Jupyter Notebook—an open-source application that allows the creation and sharing of documents that contain live code, equations, visualizations and narrative text—is a convenient way to achieve this. A Colab notebook enables the combination of executable code, rich text and images into a single document. For a consulting firm like Woolpert, these platforms provide a middle ground, or hybrid, of consulting services combined with products. For example, a client may want to have his or her own in-house scientists writing custom algorithms for RS data, but he or she also will want assistance with infrastructure management, data loading and preparation. This client may even want some algorithms written in the form of apps to provide building blocks for staff members. Additional RS data collected from companies such as Planet or Airbus also can be facilitated by a company like Woolpert. Simply put, scientists are not IT experts and they shouldn’t have to be.
Woolpert Solutions Scientist Matthew Hutchinson, Ph.D., recently returned to Woolpert after working as a sales engineer at Planet and as a geographer for the federal government. Hutchinson, an expert in the geospatial and satellite industries, earned his doctorate at Curtin University in Perth, Australia. The Woolpert associate works in Washington, D.C.