This wiki page aims to provide an overview of the major large-scale datasets used in the literature to study forest ecosystems. The first table, Biomass Estimation Datasets, describes datasets used to measure or estimate forest biomass. These datasets either provide biomass directly as a core data product or have been used extensively in the literature to derive biomass estimates.
These datasets use technologies that fall into one of the following categories:
- Field plots: Field plot data is collected using the mostly standardized forest inventory protocol, a manual technique for measuring ground-based tree attributes (most significantly, trunk diameter and height). This process does not generally scale well, but has been used for so long and requires such simple instrument technology that, in aggregate, it can provide an extensive data set. Such datasets are also typically used to calibrate biomass estimates for more scalable technologies.
- Passive optical: Passive optical sensors can be mounted on satellites, collecting optical images of earth. (This may be the most familiar type of satellite remote sensing data to readers.) The sensors are passive, meaning that they do not emit electromagnetic waves of their own, but rather measure reflected natural light. To measure biomass from passive optical images, researchers typically use some form of the Normalized Difference Vegetation Index (NDVI), which measures the difference between returns from the red and near-infra-red spectral regions. In essence, this index exploits properties of the reflectance of photosynthesizing cells to estimate the extent of vegetation.
- Synthetic Aperture Radar (SAR): Data SAR modules are active sensors, meaning that they emit electromagnetic radiation and measure its return signal after it bounces off of the earth surface. The wavelength of the emitted radiation determines the extent of its penetration through surface objects, so SAR technology is usually categorized by the band of wavelengths it emits. The bands relevant for earth observation are X, C, S, L, or P, with the X band offering no penetration of the forest canopy, and P band offering the highest level of penetration. (NASA offers a helpful introduction to SAR for earth observation).
- LiDAR: LiDAR, like SAR, is an active sensing technology that emits electromagnetic waves and measures their return signal. However, unlike SAR, LiDAR uses a laser to emit waves, resulting in very different backscatter properties (the way that the light bounces off of objects in the environment). Unlike SAR and passive optical sensors, LiDAR does not saturate at high biomass densities (Huete, 1997; Luckman, 1998; Kellner, 2021). LiDAR instruments may be space-based (as GEDI and others), aerially mounted on small planes or drones (Aerial Laser Scanning, or ALS), or carried on the ground through a forest (Terrestrial Laser Scanning, or TLS).
Leading research groups: Deep learning approaches:
GitHub - robmarkcole/satellite-image-deep-learning: Deep learning with satellite & aerial imagery
Guide to table properties:
Name: The name of the dataset. Abbreviations are expanded in parentheses.
Biomass: Whether or not the dataset offers a biomass product directly. If not, the dataset can be used to estimate biomass, see "Related papers."
Years: Years over which data is or will be collected. Note that not all years may yet be published.
Geog. Coverage: Geographic coverage of the dataset.
Spatial resolution: Resolution of the native measurements of the dataset on the spatial dimensions for which it is offered. If the dataset also offers a biomass product, spatial resolution includes the resolution of the biomass product as well.
Measurement strategy: Details of what sampling strategy was used and what scope of measurements are expected for each sample
Technology: Brief notes on the type of technology used to collect the native measurements. Wavelengths are given when applicable.
Error bounds: If offered, published bounds on the dataset's geolocation, measurement, and/or estimation error.
Notes: Additional free-form notes
Related papers: A brief selection of papers that use or discuss this dataset extensively. All papers are cited below the table. Related papers highlighted in yellow are papers from the creators that accompany the dataset or papers that offer a model for estimating biomass from the dataset.
URL: Link to access the dataset
Biomass Estimation Datasets
Some of the above information as well as an early layout for the table are inspired by a table in Duncanson et al, 2020.
Guide to Soil Layer Maps