We describe a modern C++20/Python toolbox for feature extraction and geospatial data manipulation developed at the Center for Space Research and used for a variety of data processing applications in our lab. The toolbox provides powerful feature extraction tools in a suite of flexible, modular applications that can be used to compose geospatial data processing pipelines. The toolbox is exposed through a web API and is planned for release later this year.
Global digital elevation models (DEMs) generated from spaceborne synthetic aperture radar (SAR), such as the Copernicus 30 m DEM , provide exceptional coverage of the Earth's topography. However, SAR-derived DEMs struggle to accurately map terrain under forest canopies and in certain topographic conditions. In contrast, spaceborne laser altimeters like ICESat-2 can accurately measure ground elevations in areas where SAR sensors struggle, but the lack of dense coverage from laser altimetry precludes creation of complete global DEMs. This work aims to combine the accuracy of laser altimetry with the coverage of SAR by using deep learning algorithms. A convolutional neural network (CNN) is trained to correct Copernicus 30 m DEM using sparse but accurate ICESat-2 elevations in the south-east United States around South Carolina. Model inputs include temporally coincident imagery from Sentinel-2A, other SAR inputs from Sentinel-1B, as well as Copernicus 30 m DEM. The CNN is trained to correct the elevation of each individual pixel, allowing for the use of sparse ICESat-2 measurements. This allows the creation of a global DEM with the coverage of SAR and precision closer to that of laser altimetry. The resulting CNN model reduced ground elevation RMSE from 8.65 m to 2.62 m. The corrected DEM has potential to benefit numerous scientific endeavors requiring accurate global topographic information.
ICESat-2's Advanced Terrain Laser Altimeter System (ATLAS) can penetrate water bodies, enabling accurate and detailed measurements of bathymetry in diverse aquatic environments. ATLAS's capabilities have made it a popular tool for understanding underwater topography and characteristics. In this paper, we present a deep residual classification network used to identify ICESat-2 bathymetry and water surface photons. The training data used to train the model was derived from both hand-labeled ICESat-2 groundtracks and from synthetic data produced by custom ICESat-2 ground track simulator software. This investigation was unique in that it used a very wide variety of ground tracks across the entire globe, and it also used several different metrics to summarize the classification performance.
ICESat-2 is a space-based laser altimetry mission that provides accurate 3D representations of the earth’s surface. The Advanced Topographic Laser Altimeter System (ATLAS) onboard ICESat-2 can measure surface heights with great accuracy, providing critical measurements needed to better understand key surface structure variables. The Photon Research and Analysis Library (PhoREAL) was designed to provide a customizable analysis tool for NASA’s ICESat-2 Land and Vegetation (ATL08) data. With PhoREAL, users can resample, reproject, and recalculate terrain and canopy height statistics at any along-track resolution. PhoREAL is designed both as a command line tool and a Windows-based GUI and provides functionality such as reading ATL08/ATL03 data files and exporting the geolocated photon data to multiple output file formats; combining ATL08/ATL03 data products to label the individual photons as noise, ground, canopy, and top of canopy photons; comparing and aligning the measured ICESat-2 data relative to reference data; computing ICESat-2 height and radiometric statistics for ground and canopy photons at specified bin lengths; and plotting the ICESat-2, reference, and statistical data together. PhoREAL is available as free and open-source software on GitHub (https://github.com/icesat-2UT/PhoREAL). It can run as a Windows executable or in a Python environment for compatibility in both Windows and Linux environments. PhoREAL is a valuable tool for scientists who want to analyze ICESat-2 data. It provides a user-friendly interface for accessing and processing the data, and it offers a variety of features that can be used to extract valuable information from the data.
We present a Simple POint Cloud (SPOC) file format, suitable for efficiently storing and processing geospatial point cloud data. This format provides support for 64-bit floating-point precision coordinates, compressed storage, and data streaming. The code base is implemented as a header-only, modern C++ library with Python extensions under an open source license. The format can be applied in a wide variety of use case scenarios, and was motivated by a need for high-precision, transparent data storage and transmission for geospatial processing and machine learning applications. Existing file formats sometimes either do not support the precision and dynamic range necessary for certain applications, they do not support common interprocess communication protocols, or they are overly complex or rigid.
The Ice, Cloud and Land Elevation Satellite-2 (ICESat-2), on-orbit for nearly 3 years, continues to provide global elevation measurements to the broad science community. One of the most transformative discoveries during early mission operations was the realization that ICESat-2 could provide bathymetry in addition to the planned surface-specific data products for land ice, sea ice, ocean and land/vegetation. This is an important capability for coastal science, maritime intelligence and shallow water benthic monitoring at the global scale. ICESat-2 elevations are also becoming a critical component of satellite-derived bathymetry where multispectral imagery uses the measurements to create broad spatial maps with absolute vertical bathymetric depths. This article will highlight some of the most salient bathymetric observations and quantitative analyses of this space-based photon counting lidar that includes sea floor elevation retrievals but also environmental characterization such as wave structure and turbidity and monitoring of benthic habitats.
NASA’s Ice, Cloud and Land Elevation (ICESat-2) satellite is planned for launch in 2018 with the goal of providing a global distribution of elevation measurements to support many Earth science applications. The primary science mission is focused on the Polar Regions and will provide data to help understand controlling mechanisms of polar ice sheet mass balance, and investigate ice-ocean-atmosphere exchanges of mass, energy and moisture as it relates to sea ice thickness. ICESat-2 will also allow for terrain and canopy height retrievals as it operates continually throughout its orbit. The satellite will utilize a laser altimeter that provides signal detection sensitivities on the photon-level. This instrumentation allows for lower power and weight requirements to support a high repetition rate, multiple-beam configuration for improved spatial coverage as compared to previous missions. In order to develop the geophysical data product algorithms in preparation for launch, simulated data sets have been produced based on the statistical representation of the expected system performance. These data allow for data product quality analysis over specific types of ecosystems. This is of particular interest for vegetated regions, where canopy cover characteristics will directly affect the ability to retrieve terrain heights. This paper will discuss the expected ICESat-2 land/vegetation data product quality a selected ecosystem.
This study examines the utility of cocollected, dual-wavelength, full-waveform lidar data to characterize vegetation and landscapes through the extraction of waveform features, such as total waveform energy, canopy energy distribution, and foliage penetration metrics. Assessments are performed using data collected in May 2014 over Monterey, California, using the Chiroptera dual-laser lidar mapping system from Airborne Hydrography AB. Both full-waveform and discrete return data were collected simultaneously at green (532 nm) and near-infrared (NIR) (1064 nm) wavelengths; however, the two channels are operated independently at different pulse repetition frequencies, thus measurements are not spatially coincident. A voxelization approach is employed to generate pseudowaveforms for each wavelength along vertical columns in a regularly spaced grid, such that spectral waveform properties can be evaluated independently of spatial variations resulting from instrumentation configuration and collection scenario. The pseudowaveforms are parameterized and extracted parameters are mapped to raster layers, which are then used as inputs to a random forest classifier to predict land cover classifications across the survey area. In comparison to independent classification results for the two wavelength channels, the combination of the NIR and green response provided an improvement in overall classification accuracy of up to 6%. This effort presents the methodology associated with the voxelization approach and the exploitation of the pseudowaveform features, while illustrating a potential utility for geospatial classification using multiple wavelengths.
Light detection and ranging (LIDAR) technology offers the capability to rapidly capture high-resolution, 3-dimensional surface data with centimeter-level accuracy for a large variety of applications. Due to the foliage-penetrating properties of LIDAR systems, these geospatial data sets can detect ground surfaces beneath trees, enabling the production of highfidelity bare earth elevation models. Precise characterization of the ground surface allows for identification of terrain and non-terrain points within the point cloud, and facilitates further discernment between natural and man-made objects based solely on structural aspects and relative neighboring parameterizations. A framework is presented here for automated extraction of natural and man-made features that does not rely on coincident ortho-imagery or point RGB attributes. The TEXAS (Terrain EXtraction And Segmentation) algorithm is used first to generate a bare earth surface from a lidar survey, which is then used to classify points as terrain or non-terrain. Further classifications are assigned at the point level by leveraging local spatial information. Similarly classed points are then clustered together into regions to identify individual features. Descriptions of the spatial attributes of each region are generated, resulting in the identification of individual tree locations, forest extents, building footprints, and 3-dimensional building shapes, among others. Results of the fully-automated feature extraction algorithm are then compared to ground truth to assess completeness and accuracy of the methodology.
Bare earth extraction is an important component to light detection and ranging (LiDAR) data analysis in terms of terrain classification. The challenge in providing accurate digital surface models is augmented when there is diverse topography within the data set or complex combinations of vegetation and built structures. Few existing algorithms can handle substantial terrain diversity without significant editing or user interaction. This effort presents a newly developed methodology that provides a flexible, adaptable tool capable of integrating multiple LiDAR data attributes for an accurate terrain assessment. The terrain extraction and segmentation (TEXAS) approach uses a third-order spatial derivative for each point in the digital surface model to determine the curvature of the terrain rather than rely solely on the slope. The utilization of the curvature has shown to successfully preserve ground points in areas of steep terrain as they typically exhibit low curvature. Within the framework of TEXAS, the contiguous sets of points with low curvatures are grouped into regions using an edge-based segmentation method. The process does not require any user inputs and is completely data driven. This technique was tested on a variety of existing LiDAR surveys, each with varying levels of topographic complexity.
Bare earth extraction is an important component to LADAR data analysis in terms of terrain classification. The
challenge in providing accurate digital models is augmented when there is diverse topography within the data set or
complex combinations of vegetation and built structures. A successful approach provides a flexible methodology
(adaptable for topography and/or environment) that is capable of integrating multiple ladar point cloud data attributes. A
newly developed approach (TE-SiP) uses a 2nd and 3rd order spatial derivative for each point in the DEM to determine
sets of contiguous regions of similar elevation. Specifically, the derivative of the central point represents the curvature of
the terrain at that position. Contiguous sets of high (positive or negative) values define sharp edges such as building
edges or cliffs. This method is independent of the slope, such that very steep, but continuous topography still have
relatively low curvature values and are preserved in the terrain classification. Next, a recursive segmentation method
identifies unique features of homogeneity on the surface separated by areas of high curvature. An iterative selection
process is used to eliminate regions containing buildings or vegetation from the terrain surface. This technique was tested
on a variety of existing LADAR surveys, each with varying levels of topographic complexity. The results shown here
include developed and forested regions in the Dominican Republic.
KEYWORDS: Photons, LIDAR, Signal to noise ratio, Signal detection, Edge detection, Interference (communication), Electronic filtering, Signal processing, Statistical analysis, Data acquisition
Many of the recent small, low power ladar systems provide detection sensitivities on the photon(s) level for altimetry
applications. These "photon-counting" instruments, many times, are the operational solution to high altitude or space
based platforms where low signal strength and size limitations must be accommodated. Despite the many existing
algorithms for lidar data product generation, there remains a void in techniques available for handling the increased noise
level in the photon-counting measurements as the larger analog systems do not exhibit such low SNR. Solar background
noise poses a significant challenge to accurately extract surface features from the data. Thus, filtering is required prior to
implementation of other post-processing efforts. This paper presents several methodologies for noise filtering photoncounting
data. Techniques include modified Canny Edge Detection, PDF-based signal extraction, and localized statistical
analysis. The Canny Edge detection identifies features in a rasterized data product using a Gaussian filter and gradient
calculation to extract signal photons. PDF-based analysis matches local probability density functions with the aggregate,
thereby extracting probable signal points. The localized statistical method assigns thresholding values based on a
weighted local mean of angular variances. These approaches have demonstrated the ability to remove noise and
subsequently provide accurate surface (ground/canopy) determination. The results presented here are based on analysis
of multiple data sets acquired with the high altitude NASA MABEL system and photon-counting data supplied by Sigma
Space Inc. configured to simulate the NASA upcoming ICESat-2 mission instrument expected data product.
KEYWORDS: LIDAR, Vegetation, Surface roughness, Data modeling, Data acquisition, Algorithm development, Digital filtering, Modulation, Signal processing, Data analysis
Terrain classification, or bare earth extraction, is an important component to LADAR data analysis. The terrain
classification approach presented in this effort utilizes an adaptive lower envelope follower (ALEF) with an adaptive
gradient operation for accommodations of local topography and roughness. In order to create a more robust
capability, the ALEF was modified to become a strictly data driven process that facilitates a quick production of the
data product without the subjective component associated with user inputs. This automated technique was tested on
existing LADAR surveys over Wyoming's Powder River Basin and the John Starr Memorial Forest in Mississippi,
both locations with dynamic topographic features. The results indicate a useful approach in terms of operational time
and accuracy of the final bare earth recovery with the advantage of being fully data driven.
In response to the 2010 Haiti earthquake, the ALIRT ladar system was tasked with collecting surveys to
support disaster relief efforts. Standard methodologies to classify the ladar data as ground, vegetation, or
man-made features failed to produce an accurate representation of the underlying terrain surface. The majority
of these methods rely primarily on gradient- based operations that often perform well for areas with low
topographic relief, but often fail in areas of high topographic relief or dense urban environments. An
alternative approach based on a adaptive lower envelope follower (ALEF) with an adaptive gradient operation
for accommodating local slope and roughness was investigated for recovering the ground surface from the
ladar data. This technique was successful for classifying terrain in the urban and rural areas of Haiti over
which the ALIRT data had been acquired.
Laser Radar, also referred to as lidar, has become widely available and is an established contributor to the military and
intelligence community by providing precise elevation data using
3-dimensional measurements. The utilization of
customized algorithms designed for lidar data exploitation provides the capability to determine corridors or gaps in areas
of vegetation cover. These capabilities lend themselves as geospatial tools for mobility applications and tactical
planning. This effort uses elevations derived from small-footprint (airborne) lidar surveys to create accurate surface
models and corresponding canopy characterization maps. The canopy height models are based on elevation voxels
above ground level and are used as input into a tree finding algorithm. Corridors under the canopy are then predicted
using the obstruction identification technique and neighboring point characteristics. Path determination can also be
performed using the obstruction maps and a modified A-star algorithm. A lidar survey over Camp Shelby, MS was
chosen as the test case for the obstruction detection utilities as it provides fairly dense vegetation cover and interesting
topographic features. The survey was completed using both a
full-waveform lidar and a discrete return system which
offers a coincident comparison of the obstruction methodology for differing data types. It is determined that the fullwaveform
data provides a more complete and accurate assessment of the surface, the canopy and potential obstruction
detection than the discrete return system.
Innovative algorithm development for small-footprint full-waveform lidar data processing extends this technology's capabilities to more complicated acquisition scenarios then previously determined, namely success of surveys over obscured areas. Waveform decomposition and the extraction of waveform metrics provide a straightforward approach to identifying vertical structure within each laser measurement. However, there are some limitations in this approach as faint returns within the waveform go undetected within the classical processing chain. These faint returns are the result of reduced energy levels due to obscurant scattering, attenuation and absorption. Lidar surveys over non-homogeneous wooded regions indicate that there are meaningful ground returns within dense tree coverage if extracted correctly from the data. By using a waveform stacking technique with appropriate waveforms in near geospatial proximity to the original, these faint returns can be augmented and detected during data processing. In comparison to the traditional approach, the waveform stacking technique provides up to a 60% increase in perceived ground returns with the faint signal extraction for the particular datasets analyzed over a broadleaf forest in Mississippi. The enhanced capability in the presence of foliage provides a decrease in operational effort associated with data density, dwell or targeting techniques, in addition to required survey expense.
Full-waveform lidar data are emerging into the commercial sector and provide a unique ability to characterize the landscape. The returned laser waveforms indicate specific reflectors within the footprint (vertical structure), while the shape of the return convolves surface reflectance and physical topography. These data are especially effective in vegetative regions with respect to canopy structure characterization. The objective of this research is to evaluate the performance of waveform-derived parameters as input into a supervised classifier. Extracted waveform metrics include Gaussian amplitude, Gaussian standard deviation, canopy energy, ground energy, total waveform energy, ratio between canopy and ground energy, rise time to the first peak, fall time of the last peak, and height of median energy (HOME). The classifier utilizes a feature selection methodology which provides information on the value of waveform parameters for discriminating between class pairs. For this study area, energy ratio and Gaussian amplitude were selected most frequently, but rise time and fall time were also important for discriminating different tree types and densities. The lidar classification accuracy for this study area was 85.8% versus 71.2% for Quickbird imagery. Since the lidar-based input data are structural parameters derived from the waveforms, the classification is improved for classes that are spectrally similar but structurally different.
Full-waveform laser altimetry has been used in the research community since the mid-1990s and this
technology holds great potential for the science and defense communities. Laser waveforms are a digital
recording of the entire temporal profile from the reflected laser energy. The shape of the returned laser
waveform is a function of both laser and surface properties. Waveform metrics were extracted for each
waveform and include peak amplitude, peak standard deviation, integrated canopy energy, integrated ground
energy, total waveform energy, ratio between canopy and ground energy, rise time to the first peak, fall time
of the last peak, and vegetation height. The utilization of such metrics provides a potential for discriminating
and identifying discrete targets on a per-shot basis. Analysis of the entire reflected laser energy profile
provides a detailed description of distributed targets/features along the laser line-of-sight. Waveform data
collected over Camp Shelby, Mississippi reveal separation of conifer from broadleaf vegetation. Metrics such
as integrated canopy energy and fall time were found to be higher in hardwood forest than pine forest. Other
landscape features such as the presence of a burn are also detected with full-waveform data, which would
otherwise be missed with discrete return elevation data. With new full-waveform systems entering the
commercial sector, new possibilities emerge to utilize the lidar data to classify land cover as well as quantify
surface parameters.
Innovative algorithm development for full-waveform lidar data processing extends this remote sensing
technology's capabilities to even more complicated acquisition scenarios then previously determined, namely
success of surveys over obscured areas. Waveform decomposition and the extraction of waveform metrics
provide a straightforward approach to identifying vertical structure within each laser measurement. However,
there are some limitations in this approach as faint returns within the waveform go undetected in the
processing chain. These faint returns are the result of reduced energy levels due to obscurant scattering,
attenuation and absorption. Lidar surveys over non-homogeneous wooded regions indicate that there are
meaningful ground returns within dense tree coverage if extracted correctly from the data. One difficulty
associated with detecting weaker returns is the presence of a hardware induced ring by the Avalanche Photo
Diode (APD) detector in the returned waveform. By using a waveform stacking technique with adjacent
waveforms in near geospatial proximity to the original, these faint returns can be augmented and detected
during data processing without the inclusion of the false ring. In comparison to the traditional approach, the
waveform stacking technique provides a 9% increase in faint signal extraction for the particular dataset.
These faint signals are low level last returns that correspond to perceived ground reflections under canopy
cover. The enhanced capability in the presence of foliage provides a decrease in operational effort associated
with data density, dwell or targeting techniques and survey expense.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.