ANIMAL MODELING
Attempts to regionalize species models by mosaicking range distributions produced by neighboring state Gap Analysis projects have been problematic. Variations in habitat modeling result in significant differences in predicted species distributions within and across state lines. Additionally, there is a decided knowledge gap between the spatial and temporal scales used by biogeographers and wildlife managers. Using national geospatial data to map surrogates of habitat, the Nebraska Gap Analysis Project (NE-GAP) examined whether the use of statistical decision trees might help solve these problems.
We generated regional distributions of 20 selected breeding birds in the six-state GAP Great Plains region (IA, KS, MN, ND, NE, SD) using three recursive partitioning algorithms: QUEST (Quick, Unbiased, & Efficient StatisticalTrees; Loh and Shih 1997; Shih 2002); CART (Classification And Regression Trees; Breiman et al. 1984; De’ath and Fabricius 2000) as an implementation within QUEST; and CRUISE (Classification Rule with Unbiased Interaction Selection & Estimation, Kim and Loh 2000, 2001). Breeding Bird Survey (BBS) route level summaries (Sauer et al. 2003) over two time periods (last 10 and 30 years) were used for the occurrence data (presence/absence and abundance), while e nvironmental variables were developed from National Land Cover Data (Vogelmann et al. 1998), Daymet daily climatic means and variances (Thornton and Running 1999), State Soil Geographic (STATSGO) soil texture, and National Elevation Data.
Models were developed on a hexagonal grid produced by the EPA’s Environmental Monitoring and Assessment Program (EMAP) with a cell resolution across the Great Plains of approximately 40 km 2. This coverage was intersected with each variable data set to create hexagonal coverages containing averaged values, area-weighted average values, or compositional vectors for each hexagon. These coverage variables were then intersected with the BBS occurrence data. Multiple statistical decision trees were generated for each target species to evaluate the relative strengths and weaknesses of the different algorithms. These statistical trees were then pruned to provide model generality and inverted across the study area to obtain predicted habitat distributions (Figure 1).

Figures 1a and b. Yellow-billed Cuckoo 10 yr (1a) and 30 yr (1b) CART statistical decision trees and the associated predicted range distributions, shown in gray.
Algorithms were compared on the basis of speed of tree identification, interpretability of the cross-validated tree, and plausibility of the range distribution predicted from the tree. Model performance was evaluated by (1) calculating the proportion of species occurrences explained at the first model branch; (2) examining visually how well each model corresponded to published species distributions; (3) assessing correspondence of the model to the spatial distribution of the BBS data; and (4) the computational time required to generate a tree.
CART’s exhaustive search of state space took much longer to generate a tree than CRUISE or QUEST (Table 1, Figure 2). All algorithms failed to generate model trees for species with large numbers of observations and/or relatively even distributions across the region (e.g., American Crow, n > 40,000); thus, only 12 of the original 20 species produced sufficient numbers of trees for comparative analysis. CRUISE used fewer observations (number of routes = 340) in its analysis than CART or CRUISE (number of observations/route).
Processing Time (CPU-minutes) |
||||||||
#Obs |
QUEST |
CART |
CRUISE #Routes = 340 |
|||||
Common Name |
10yr |
30yr |
10yr |
30yr |
10yr |
30yr |
10yr |
30yr |
Baltimore Oriole |
10,343 |
30,075 |
12.2 |
18.3 |
933.9 |
2643.5 |
0.13 |
0.12 |
Black Tern |
5,713 |
9,797 |
9.3 |
20.5 |
428.3 |
627.6 |
0.12 |
0.14 |
Brown Thrasher |
53,139 |
125,960 |
58.8 |
76.7 |
5343.1 |
no tree |
0.16 |
0.12 |
Gray Catbird |
5,085 |
11,755 |
4.4 |
9.5 |
1058.0 |
888.7 |
0.15 |
0.13 |
Great-crested Flycatcher |
4,597 |
11,078 |
10.8 |
26.8 |
83.5 |
934.9 |
0.13 |
0.14 |
Lark Sparrow |
5,025 |
10,069 |
7.6 |
18.3 |
737.0 |
782.6 |
0.13 |
0.11 |
Northern Cardinal |
9,689 |
27,114 |
17.4 |
62.7 |
339.0 |
778.0 |
0.10 |
0.09 |
Northern Harrier |
1,593 |
3,369 |
1.1 |
2.1 |
64.3 |
232.4 |
0.13 |
0.13 |
Red-bellied Woodpecker |
2,498 |
5,162 |
3.4 |
8.0 |
174.2 |
254.2 |
0.12 |
0.11 |
Tree Swallow |
6,628 |
6,628 |
11.5 |
23.5 |
598.6 |
954.4 |
0.14 |
0.13 |
Upland Sandpiper |
12,537 |
12,537 |
15.5 |
29.3 |
189.5 |
4492.7 |
0.14 |
0.14 |
Yellow-billed Cuckoo |
3,268 |
3,268 |
5.3 |
12.1 |
199.6 |
859.7 |
0.13 |
0.11 |
Average |
8.6 |
21.4 |
382.6 |
1130.8 |
0.1 |
0.1 |
||

Figure 2. Variation explained at first model branch as a function of computational cost for generation of entire tree.
Trees built from 30 yr data explained a higher percentage of observations at the first model branch than those built from 10 yr data: 30 yr data models averaging 97%, 98%, and 67% versus 10 yr data models averaging 94%, 95%, and 60% for QUEST, CART, and CRUISE, respectively. CRUISE model explanation (avg. 64%) was significantly lower at the first model branch than either QUEST (avg. 96%) or CART (avg. 97%), although the computational costs of the CRUISE models were significantly less (Figure 2). Fewer observations and different configuration of data (routes) between the 10 yr and 30 yr data sets led to differences in inverted geographic ranges (Figure 1). Determining whether this result is due to population trends or less data will require further analysis.
Of the 72 models reported in Table 2, in the first branch of the statistical tree three attempts resulted in no tree, eight models used land cover, eight models used soils or terrain, 15 used water vapor pressure or precipitation, 16 used insolation, and 22 used temperature or frost-free days. Thus, 77% (n=53) of the models relied on climatological variables and of these, 28% (n=15) used climatic variability to model species distribution at the first step of statistical partitioning. Sixty-two percent of the climatological models emphasized the transitional seasons of spring and fall (n=33) over summer (n=12) or winter (n=8). CART and CRUISE selected insolation variables more frequently than QUEST (Table 2), which in turn selected variables more readily interpretable in terms of ecophysiological constraints on bird populations, such as the interannual variability in frost-free days.
QUEST |
CART |
CRUISE |
||||||
Common Name |
10yr |
30yr |
10yr |
30yr |
10yr |
30yr |
||
Baltimore Oriole |
% Evergreen Forest |
% Evergreen Forest |
Mean Su Insolation |
% Evergreen Forest |
Terrain |
Terrain |
||
Black Tern |
Mean Wi Vapor Pressure |
CV 1 Wi FFD 2 |
Mean Fa Insolation |
CV Sp min Air Temp |
% Emergent Herbaceous Wetlands |
CV Sp avg Air Temp |
||
Brown Thrasher |
N/T 3 |
% Evergreen Forest |
% Evergreen Forest |
N/T 3 |
% Evergreen Forest |
Land Covers |
||
Gray Catbird |
Terrain |
Terrain |
Terrain |
Terrain |
Mean Su Insolation |
Mean Su Insolation |
||
Great-crested Flycatcher |
Mean Su Insolation |
Terrain |
Mean Su Insolation |
Mean Su Insolation |
Mean Fa max Air Temp |
Mean Su Insolation |
||
Lark Sparrow |
Mean Fa max Air Temp |
Soils |
CV Wi avg Air Temp |
Mean Su Insolation |
CV Wi min Air Temp |
CV Wi min Air Temp |
||
Northern Cardinal |
Mean Sp FFD |
CV Su FFD |
Mean Sp Vapor Pressure |
Mean Sp Vapor Pressure |
Mean Sp Vapor Pressure |
Mean Sp Vapor Pressure |
||
Northern Harrier |
Mean Sp Vapor Pressure |
Mean Sp Precipitation freq |
Mean Sp Insolation |
Mean Sp Precipitation freq |
CV Fa max Air Temp |
CV Sp Insolation |
||
Red-bellied Woodpecker |
Mean Wi Air Temp |
CV Fa FFD |
Mean Sp Vapor Pressure |
Mean Sp Vapor Pressure |
Mean Sp Vapor Pressure |
Mean Sp Vapor Pressure |
||
Tree Swallow |
Mean Wi FFD |
Mean Wi FFD |
Mean Sp Insolation |
Mean Sp Insolation |
Mean Wi max Air Temp |
N/T 3 |
||
Upland Sandpiper |
Mean Su FFD |
Mean Su FFD |
Mean Su Insolation |
Mean Sp min Air Temp |
CV Fa Insolation |
CV Fa Insolation |
||
Yellow-billed Cuckoo |
CV Fa FFD |
CV Fa FFD |
CV Fa min Air Temp |
Mean Fa Vapor Pressure |
Mean Fa Vapor Pressure |
Mean Fa Vapor Pressure |
||
1 Frost-free days
2 Coefficient of variation
3 No tree was generated by the model
1. Unbiased variable selection in QUEST and CRUISE appeared to facilitate the rapid identification of parsimonious, robust models and plausible range distributions.
2. QUEST trees were generally preferable to CRUISE trees because the latter algorithm relied only upon presence/absence at the route level, while the former considered data on route-level abundance.
3. Developing habitat models using statistical trees generated from species occurrence data and environmental variables can lend a greater degree of objectivity to the modeling process, but there is still considerable subjectivity in the pruning stage that is needed for model generality (Henebry et al. 2001, Holland et al. 2002).
This work was supported in part through the GAP Research Project Evaluating the Use of Statistical Decision Trees for Modeling Avian Habitat and Regional Range Distribution from Occurrence Data and Environmental Variables.
Breiman, L., J.H. Friedman, R.A. Olshen, and C.J. Stone. 1984. Classification and regression trees. Wadsworth and Brooks/Cole, Monterey, California. 358 pp.
De’ath, G., and K.E. Fabricius. 2000. Classification and regression trees: A powerful yet simple technique for ecological data analysis. Ecology 81:3178-3192.
Henebry, G.M., B.C. Putz, and J.W. Merchant. 2001. Modeling reptile and amphibian range distributions from species occurrences and landscape variables. Gap Analysis Bulletin 10:22-24.
Holland, A.K., G.M. Henebry, B.C. Putz, M.R. Vaitkus, and J.W. Merchant. 2002. Modeling avian habitat from species occurrence data and environmental variables: Assessing the effects of land cover and landscape pattern. Gap Analysis Bulletin 11:25-27.
Kim, H., and W.-Y. Loh. 2000. CRUISE User Manual. University of Wisconsin-Madison, Department of Statistics, Technical Report 989, November 10, 2000.
Kim, H., and W.-Y. Loh. 2001. Classification trees with unbiased multiway splits. Journal of the American Statistical Association 96:589-604
Loh, W.-Y., and Y.-S. Shih. 1997. Split selection methods for classification trees. Statistica Sinica 7:815-840.
Sauer, J.R., J.E. Hines, and J. Fallon. 2003. The North American Breeding Bird Survey, Results and analysis 1966 - 2002. Version 2003.1. USGS Patuxent Wildlife Research Center, Laurel, Maryland.
Shih, Y.-S. 2002. QUEST User Manual. Department of Mathematics, National Chung Cheng University, Taiwan, April 17, 2002.
Thornton, P.E., and S.W. Running. 1999. An improved algorithm for estimating incident daily solar radiation from measurements of temperature, humidity, and precipitation. Agriculture and Forest Meteorology 93:211-228.
Vogelmann, J., T. Sohl, and S. Howard. 1998. Regional characterization of land cover using multiple sources of data. Photogrammetric Engineering and Remote Sensing 64:45-57.
Return to Table of Contents