TreeScanPL10K Dataset Unlocks AI-Powered Forest Analysis with 10,000+ Laser-Scanned Trees

Dense forest canopies block the view from above
Why terrestrial laser scanning from the ground solves a fundamental problem that satellite imagery cannot overcome.

In an era when forests face mounting pressure from climate disruption and biodiversity loss, a team of Polish researchers has answered a quiet but consequential need: the need to truly see what grows in the woods. By assembling over 10,000 laser-scanned, species-labeled trees into a dataset called TreeScanPL10K, they have laid a foundation for artificial intelligence to learn the ancient art of reading a forest — not from above, where canopies obscure the truth, but from the ground up, where light and geometry reveal it. This is the work of building memory for machines so that the knowledge foresters carry in their hands and eyes might one day scale to every threatened woodland on Earth.

  • Forests are disappearing faster than human observers can measure them, and the AI tools capable of automating that measurement have been starved of the labeled training data they require.
  • Aerial and satellite sensors hit a physical wall at the canopy — dense treetops swallow the view from above, leaving ground-level forest structure largely invisible to remote monitoring.
  • Terrestrial laser scanners fired from four positions per plot stitched together millimeter-accurate 3D point clouds, capturing not just tree shape but the reflectance signatures that distinguish a beech from a spruce by the way light bounces off bark and needles.
  • Annotating 10,417 trees — matching scanner geometry to field inventory records and flagging crown completeness — proved the most painstaking phase, with broadleaf crowns demanding more manual correction than the cleaner geometry of conifers.
  • With 72% of trees carrying confirmed species labels and dense optical data preloaded for AI training, TreeScanPL10K now positions researchers to build models capable of automating species classification, biomass estimation, and structural forest monitoring at scale.

A research team in Poland has built something the field of AI-driven forestry has long needed but rarely had: a large, carefully labeled library of trees rendered in three dimensions. Their dataset, TreeScanPL10K, contains over 10,000 laser-scanned trees from Central European forests, each captured with millimeter-level precision and tagged by species. The goal is to give machine learning systems the training material they need to eventually automate the work of identifying, measuring, and monitoring forests at a scale no human crew could match.

The problem motivating the project is both physical and ecological. Dense forest canopies defeat aerial sensors — satellites and drones simply cannot see through the roof of a mature forest. Terrestrial laser scanning sidesteps this by working from the ground, firing near-infrared pulses into the stand and recording what returns. The result is a point cloud of millions of coordinates that captures not only the geometry of trunks and crowns but also the intensity of the laser's reflection — a signal that varies by species and can separate a spruce from a beech even when their silhouettes look alike.

Scans were conducted across 500-square-meter circular plots using two professional-grade phased scanners, each plot captured from four positions to eliminate blind spots. Reference spheres placed throughout each plot served as anchors for stitching the scans into a single coherent model. Annotation followed: algorithms parsed spatial coordinates and reflectance values to isolate individual trees, while species labels were assigned by cross-referencing the scanner data with traditional field inventory records, using trunk proximity and diameter as the bridge between digital and physical observation.

The finished dataset holds 10,417 segmented trees, with species confirmed for roughly 72 percent of them — conifers like Scots pine and Norway spruce alongside broadleaves such as European beech. Conifers proved easier to annotate cleanly, their simpler crown geometry producing tidier laser returns, while broadleaf crowns required more manual correction where they overlapped. Quality flags mark whether each tree's crown fell fully within the scan boundary, giving future researchers a clear sense of record completeness.

The ambitions for TreeScanPL10K reach beyond species identification. The same dense optical data supports biomass estimation and long-term structural monitoring — the kinds of measurements that sustainable forest management increasingly demands as climate pressure mounts. The dataset's value will ultimately be tested by how well the models trained on it generalize to forests beyond Central Europe, but it marks a meaningful step toward giving AI the grounded, detailed view of the forest that satellites have never been able to provide.

A team of researchers has assembled a dataset of over 10,000 laser-scanned trees, each one captured in precise three-dimensional detail and labeled by species. The collection, called TreeScanPL10K, represents a significant resource for the emerging field of AI-driven forestry—a domain that has long struggled with a fundamental problem: there simply hasn't been enough high-quality labeled data to train the algorithms that could automate the work of identifying and analyzing individual trees across large forest areas.

The challenge driving this work is straightforward but urgent. Forests are under pressure from climate change and biodiversity loss, and modern forestry demands better tools for measuring what's actually there—how many trees, what species, how much carbon they hold, whether the ecosystem is healthy. Satellite imagery and airborne sensors have their place, but they struggle with a basic physics problem: dense forest canopies block the view from above. Terrestrial laser scanning solves this by working from the ground up, sending near-infrared laser pulses into the forest and measuring what bounces back. The result is a dense point cloud—millions of individual coordinate measurements—that captures not just the shape of a tree but also how strongly the laser reflects off different surfaces, information that varies by species and can help distinguish a beech from a spruce even when their silhouettes look similar.

The researchers conducted their scans across forest plots in Poland using two types of phased laser scanners: FARO Focus 3D X130 and Trimble TX5 instruments. Each circular plot covered 500 square meters and was scanned from four different positions—three arranged in a triangle with one central scan—to minimize blind spots and capture the full architecture of crowns and stems. The scanners operated at quarter resolution, achieving roughly four-millimeter accuracy at a distance of ten meters. To stitch the scans together into a single coherent point cloud, the team placed reference spheres in each plot, using their known geometry as anchors for precise alignment. The result was millimeter-level accuracy across the final dataset.

The real work came in annotation. Algorithms identified individual trees by analyzing both the spatial coordinates and the intensity values—the strength of the laser return—to separate stems from crowns and distinguish one tree from another even in crowded stands. Species labels were then assigned by matching the laser-detected trees to field inventory data, using proximity and diameter measurements as the bridge between what the scanner saw and what foresters had recorded on the ground. The team introduced a useful quality flag called "completelyInside" to mark whether a tree's entire crown fell within the scan boundaries, a measure of how complete each record was. They also precomputed morphological metrics like height and crown area directly from the point cloud data.

The final dataset contains 10,417 segmented trees, with species identified for approximately 72 percent of them. The trees span the diversity of Central European forests: conifers like Scots pine and Norway spruce alongside broadleaves such as European beech. The intensity data proved particularly valuable for distinguishing species with similar shapes but different bark and needle characteristics—the laser reflectance signature acts as a fingerprint. Quality control revealed the expected challenges: ground contamination near tree bases and segmentation errors where crowns overlapped. But the fine spatial and intensity resolution allowed manual correction. Interestingly, conifers required fewer corrections than broadleaves, likely because their simpler crown geometry produces cleaner, more distinct laser return patterns.

The implications extend well beyond species classification. The dataset's dense, multi-angle optical information supports biomass estimation, structural monitoring, and the development of deep learning models that could eventually automate much of the work foresters now do by hand or eye. As climate change accelerates and the pressure to manage forests sustainably intensifies, tools that can rapidly and accurately characterize what's growing in a forest become increasingly valuable. TreeScanPL10K represents a step toward making that automation possible, providing the labeled training data that AI systems need to learn the task. The next phase will be seeing how effectively researchers can leverage this optical richness—the geometry and the reflectance together—to build models that work reliably across different forests and regions.

Terrestrial laser scanning provides bottom-up views free of occlusion limitations, delivering detailed scans of individual trees including stem shapes and crown architecture.
— Research team, TreeScanPL10K study
A Conversa do Hearth Outra perspectiva sobre a história
Inventor

Why does a forest dataset matter so much right now? Aren't there already ways to count trees and identify species?

Model

There are, but they're slow and expensive. A forester walks the woods with a clipboard. Satellites see the canopy but can't see individual trees underneath. Laser scanning from the ground solves that, but only if you have enough labeled examples to teach an AI system what a beech looks like versus a spruce in the data.

Inventor

So this is about training data—the raw material for machine learning.

Model

Exactly. You need thousands of examples, each one correctly labeled, so the algorithm learns the pattern. Before this dataset, that bottleneck didn't exist in theory; it existed in practice. No one had assembled 10,000 carefully scanned and annotated trees in one place.

Inventor

What makes laser scanning better than just looking at a photo?

Model

A photo is flat. A laser scan is three-dimensional—it captures the actual shape of the crown, the thickness of the stem, how the branches are arranged. It also measures how strongly the laser bounces back, which varies by species. A beech's bark reflects differently than a spruce's needles. That intensity signature is invisible in a photo but crucial for identification.

Inventor

How accurate is this data?

Model

Millimeter-level. They scanned from four angles to avoid blind spots, used reference spheres to align the scans perfectly, and then manually corrected errors. It's the kind of precision you need if you're going to train an AI system to work reliably in real forests.

Inventor

What happens next? Is someone already building a classifier with this?

Model

That's the point. The dataset is published now, so researchers everywhere can download it and develop algorithms. The real test will be whether those algorithms work on forests outside Poland, on species they weren't trained on. That's where the value gets proven.

Quer a matéria completa? Leia o original em AZoOptics ↗
Fale Conosco FAQ