To produce its wildlife-habitat relationship models, the Nebraska Gap Analysis Program (NE-GAP) used recursive partitioning applied to species occurrence data (in the form of museum voucher specimens or curated surveys) and a geodatabase of landscape variables on a hexagonal grid with a nominal spatial resolution of 40 km 2 (Henebry et al. 2001, Holland et al. 2002). Here we describe our approach to accuracy assessment for modeled range distributions.
To generate the habitat models we used QUEST (Quick, Unbiased, & Efficient Statistical Trees; Loh and Shih 1997 ), a recursive partitioning algorithm similar to CART (Classification & Regression Trees; Breiman et al. 1984). QUEST has several advantages for habitat modeling: it is much faster than CART, variable selection is unbiased, it handles categorical predictor variables with many categories, and uses automated cross-validation (De'ath and Fabricius 2000, Shih 2002). The motivation for using this strategy is twofold. Not only are the resulting trees of decision points and values that form the models understandable, debatable, and tunable, the nonparametric modeling can handle the multimodality likely to be found in species occurrence data.
The suite of environmental variables (land cover, climate, soils, terrain) included in the modeling process are described in Henebry et al. (2001). Modeling was performed across a hexagonal grid produced by the EPA EMAP program with a cell resolution of about 40 km 2 within Nebraska. Each variable was rescaled from its raster resolution (900 m 2 for land cover, soils, and terrain data and 2.25 km 2 for climate variables) to the coarser hexagonal coverage. All environmental variables contained within the hexagons that intersected BBS routes or CBC circles were associated with the species occurrence data at those sampling locations. Continuous variables were rescaled by area-weighted averaging. Categorical variables were represented as a compositional vector.
Species occurrence data were gathered from route-level composites of the USGS Breeding Bird Survey (BBS; www.pwrc.usgs.gov/bbs) and circle composites of the National Audubon Society’s Christmas Bird Count (CBC; www.audubon.org/bird/cbc/) for the period 1970-2000. Given the intensive repeated observations, if a species was not reported along a sampling unit during the study period, it was considered absent. However, it is important to distinguish this inference of absence that is accepted only after many years of observation from an observed absence in a particular year. The use of these absences is different in kind: the former can be used in model construction but the latter is not reliable for accuracy assessment.
Occurrence data and associated environmental variables for each species were submitted to QUEST. Resulting statistical trees were trimmed or pruned interactively by querying the hexagonal coverage of environmental variables to evaluate the sensitivity of the tree splits and assess model generality. The final tree served as the wildlife-habitat relationship model. Using the threshold values of the environmental variables selected in the final model, the geodatabase was queried to produce each species’ predicted habitat distribution.
For those species lacking sufficient occurrence data (including all mammals, many birds, and a few reptiles and amphibians), the literature was consulted to identify specific environmental variables that could be used for habitat surrogates. The identified variables were then queried to the geodatabase. The predicted range was assessed visually against the reported range and, if there was a large discrepancy, different variables or variable thresholds were tested. Fitness for both model types was evaluated in two ways: the proportion of the occurrences explained and the visual appearance of the predicted range distribution. Parameter nudging was employed to assess, albeit informally, the sensitivity of specified values to range extent.
Accuracy assessment of the range distributions relied on independent species occurrence data. These independent data had various sources. Literature-based mammal models (n=78) were evaluated against georeferenced voucher specimens collected since 1970 (1,805 unique observations) in the Nebraska State Museum (NSM) and evaluated again at the county level (794 unique observations). The reptile and amphibian models (n=62) were evaluated against voucher specimens collected since 1970 (357 unique observations) in museums other than NSM. The BBS models (n=192) were evaluated against the BBS route level summaries for 2001 and 2002 (1,953 unique observations) and separately evaluated against voucher specimens collected since 1970 from NSM and other museums (733 unique observations).
We focused on rates of omission error because the occurrence data are strictly presence-only.
Accuracy assessment was performed at two scales of model representation: the modeling resolution of 40 km 2 hexagons and the reporting resolution of 640 km 2 hexagons. Occurrence data were represented―depending on data source―as county, route, or hexagon. All museum voucher data were scaled to the county level, as it was the only consistent spatial information for many specimens, especially for data from museums other than NSM. The NSM mammal occurrence data that were georeferenced were evaluated as hexagons. The 2001 and 2002 BBS data were evaluated as routes, i.e., the composite of hexagons intersected by the survey route.
We used two different de minimis thresholds in the accuracy assessments. We required a “presence” in at least one spatial unit associated with the underlying data (e.g., BBS route, county) for all occurrence data except the georeferenced mammal voucher specimens from NSM. To make those point data comparable to the other data, we first “promoted” the point occurrences to the model hexagon level and required at least five “presence” modeling hexagons to qualify for accuracy assessment. To avoid inflating accuracies, assessments of omission error excluded species with statewide distributions.
The median (mean) omission error rate for the BBS models was 0% (7%), with 90%, 87%, and 80% of the models having omission error rates less than 15%, 10%, and 5% respectively (Figure 1). The median (mean) omission error rate for the reptile and amphibian models was 0% (4%), with 93% of the models having omission error rates less than 5%. The median (mean) omission error rate for the mammal models was 14% (20%), with 55%, 39%, and 21% of the models having omission error rates less than 15%, 10%, and 5%, respectively. Rescaling from the modeling grid to the reporting hexagons (640 km 2) significantly decreased the omission error rates, as expected (Table 1).

Figure 1. Distribution of omission error rates in QUEST (n=82) and literature (n=44) models assessed using Breeding Bird Survey route level summaries from 2001 and 2002 in modeling (40 km 2) and reporting (640 km 2) EMAP hexagons.
Taxon Method Scale |
Modeling Hexagons Omission Error Rate <10% | <20% |
Reporting Hexagons Omission Error Rate <10% | <20% |
No. of Excluded Statewide Species |
Birds QUEST BBS County Literature BBS County |
87% 91% 60% 63%
86% 93% 74% 74% |
93% 95% 64% 65%
91% 93% 76% 78% |
35 |
Reptiles & Amphibians QUEST County Literature County |
95% 95%
88% 88% |
95% 95%
88% 88% |
3 |
Mammals Literature NSM hexagon County |
40% 70% 89% 91% |
82% 89% 91% 93% |
27 |
While the results of the accuracy assessment are encouraging for most species, there remain several challenges to performing accuracy assessment of habitat models using occurrence data. First, few voucher specimens are geospatial data, and the county-level resolution of the older vouchers is so much coarser than the modeling resolution that the results must be interpreted with caution. Second, BBS survey data are temporally sparse for many species, and route level summaries across 30 years may collapse significant trends or fluctuations in population dynamics (cf. Vaitkus et al. 2003, this issue). Third, the discrepancy between the finer spatial resolution of the modeling hexagons and the coarser resolution of the reporting hexagons translates into reduced omission error rates and inflated model accuracy at the reporting resolution. Fourth, a full accuracy assessment requires that commission error be considered, but the lack of true absence data means than neither commission nor correct absence frequencies can be calculated. Fifth, the geographic sampling bias in common among voucher specimens limits the reliability of even omission error rate estimates. We interpret the higher omission error rates for the mammal models as a strong artifact of the clustering of specimens from four locales used for class collecting trips over the years.
1. The results suggest that the habitat modeling approach using recursive partitioning has advantages over literature gestalt in producing range distributions with lower rates of omission error.
2. Developing habitat models using statistical trees generated from species occurrence data and environmental variables can lend a greater degree of objectivity to the modeling process, but there is still considerable subjectivity in the pruning stage that is necessary for model generality.
3. There remain significant methodological challenges to accuracy assessment of the predictions of wildlife-habitat relationship models.
Henebry, G.M., B.C. Putz, and J.W. Merchant. 2001. Modeling reptile and amphibian range distributions from species occurrences and landscape variables. Gap Analysis Bulletin 10:22-24.
Holland, A.K., G.M. Henebry, B.C. Putz, M.R. Vaitkus, and J.W. Merchant. 2002. Modeling avian habitat from species occurrence data and environmental variables: Assessing the effects of land cover and landscape pattern. Gap Analysis Bulletin 11:25-27.
Loh, W.-Y., and Y.-S. Shih. 1997. Split selection methods for classification trees. Statistica Sinica 7:815-840.
Shih, Y.-S. 2002. QUEST User Manual. Department of Mathematics, National Chung Cheng University, Taiwan. April 17, 2002.
Vaitkus, M.R., G.M. Henebry, B.C. Putz, and J.W. Merchant. 2003. Evaluating the use of statistical decision trees for modeling avian habitats and regional range distributions in the Great Plains. Gap Analysis Bulletin 12:36-40.
Return to Table of Contents