Utah Vegetation Mapping - Computer Assisted
Tom Edwards, US Fish and Wildlife Service, Utah Cooperative Fish and Wildlife Research Unit, Department of Fisheries and Wildlife, Utah State University, Logan, UT 84322-5210, (801)797-2529, firstname.lastname@example.org.
In Utah, four advantages of machine classification over analog classification were appealing to state cooperators; 1) a much finer spatial resolution of polygon boundaries can be achieved, 2) a complete lineage for each pixel is preserved from raw data to finished class which offers advantages in updating, consistency and repeatability, 3) it's output provides non-scalar based data which can be used at various levels of resolution, and 4) digital classification could be accomplished without depending on current vegetation maps (which were almost non-existent) and with less field work because of the greater extrapolation potential of unsupervised classification.
The success of digital TM classification is related to file size, quality and quantity of training points, quality of ancillary data and project time available (i.e. a 1000 pixel file with 100 training points can be more confidently classified than a 1,000,000 pixel file with 100 training points). Smaller TM data files can produce remarkable pixel level results when many of the soil and topographic variable are relatively constant (Homer et al). However, classification of large pixel files over complete landscapes containing multiple spectrums of plants, soils and topographic characteristics can be a formidable task, especially when attempting to maintain accuracy at local scales.
The objective of Gap analysis vegetation in Utah is to produce a flexible, state-wide database that can meet the requirements of the Gap program, as well as be useful to state and federal agencies for other related mapping and land management purposes. This requires developing a vegetation cover-type map useable at multiple scales over large expanses. To address this, we divided Utah into 3 major eco-region zones or files for independent classification in an effort to reduce the spectral class confusion that results from classifying complex large landscapes.
Two options are available in classification of areas larger than one TM scene; 1) each scene can be independently classified and the resulting classifications can then be edge-matched, 2) raw TM scenes can be edge-matched before classification, resulting in a seamless classification map. In Utah we choose to edge-match raw TM scenes into one state-wide seamless file. Utah required 14 TM path/row locations to cover the state. Twenty four separate dates were used to provide the base coverage, as well as patch out the major cloudy areas (). The primary base scenes were selected based on temporal distribution, cloudiness, and availability. When possible we used temporally close images during the growing season to reduce spectral signature variability resulting from yearly differences in biomass production and seasonal plant phenology. Concessions were made between temporal adjacency and cloud restrictions. All base imagery were collected between June and August of 1988 and 1989 to minimize yearly variations. However, the additional 10 scenes used for cloud patching varied in dates from 1985 to 1993, but were all still in the summer growing season.
All base images were geographically registered using 1:24,000 USGS quadrangles or USGS 1:24,000 orthophotoquads. Approximately 50 rectification control points were selected as uniformly as possible across each image and the subsequent coefficient created with a root mean square (RMS) error parameter of 1 pixel (30 meters). A bilinear interpolation algorithm was used to resample the images. Additional cloud patch images were rectified using image to image rectification (rectified to the base images). Since file sizes were much smaller, approximately 10-15 gcp points were gathered for these smaller pieces which were rectified with an RMS error usually smaller than .5.
Atmospheric standardization and histogram matching were used to mosaic imagery into a state- wide image of Utah. Atmospheric standardization is the process of minimizing the effects of atmospheric scattering on reflected light received by the sensor (Jensen 1986). Histogram matching is the process of adjusting the brightness values of one image to match brightness values of an adjacent image to form a seamless mosaic of the two.
There are three typical methods used to atmospherically standardize images; histogram bias, improved histogram bias (Chavez 1988), or regression intercept method (Jensen 1986). Using one image as a test case, each method was evaluated to determine basic differences. Histogram bias constitutes a best guess reduction in brightness values for each spectral band. The operator identifies what he/she feels is the lowest value of the data set and biases the histogram based on this value for each spectral band. The improved histogram bias is more objective and includes estimates of atmospheric clarity during the overflight to calculate bias values (Chavez, 1988). The regression intercept method is a simple linear regression model that uses the middle infrared (TM band 7) as the independent variable and each of the other spectral bands as dependent variable. The point at which the regression line intercepts the 'X' axis is used as the bias value to adjust the image for atmospheric scattering. It assumes that the middle infrared bands have little or no attenuation due to atmospheric scattering (Chavez 1988). Comparative results of all three methods for one image in the 14 image set of Utah are depicted in Table 1.
Table 1. Comparison of atmospheric standardization procedures on one image of the Utah mosaic. All bias values are negative.
TM Band Best Guess Regression Chavez 1 47 45 45 2 15 14 15 3 12 11 10 4 5 6 5 5 0 0 0 7 0 0 0
Both the best guess and the Chavez method required assumptions of atmospheric clarity that could not be easily substantiated. Consequently, we selected the regression method due to its more objective approach. However, based on the results of our test any method would have yielded similar results.
Histogram matching to mosaic adjacent images compares a 'slave' image to an adjacent 'master' image. The distribution of brightness values in the slave image is manipulated to match the distribution of brightness values in the master scene. The high, low, and central portion of the histogram of the slave is modified to resemble the master.
One consequence of histogram matching is the alteration of spectral signatures and a resultant loss of radiometric accuracy. We employed an alternate method to match image brightness value distribution to mosaic images. Using the regression intercept method, we "atmospherically standardized" adjacent images to a master image that had been processed for atmospheric haze. This approached biased histograms of slave images based on the average difference of brightness values between master and slave. A bias based on average difference does not alter spectral signatures as much as histogram matching. Image overlap zones were evaluated and the mean difference in each band between a master and a slave image was calculated.
One Utah image covering the majority of spectral types including, alpine, mountain shrub, salt desert shrub, plya, and water was chosen to act as the master image. This image included major ecoregions (Omernik 1987) of the state including, the Basin & Range, Wasatch-Uinta Mountains, and Colorado Plateau. Path 37, Row 33 commonly known as the Manti Image met these requirements and was used as the master image to both atmospherically and radiometrically standardize all images for the statewide mosaic (FIG. 1). This image is also the most central to the state and covers the majority of spectral types. The central location means the maximum amount of direct overlay to the master scene by adjacent scenes. Each image adjacent to the Manti image was georeferenced and overlain on the Manti image for comparison. Once a slave image was radiometrically matched to the master, it became a master for its' adjacent scenes.
There was some concern for the positive biases in the middle infrared bands of some images. The majority of this positive difference was attributed to phenological differences of vegetation. A possible drawback to this technique might be the difficulty encountered when future imagery is compared to this base.
Utah, depending on the source, has from 3 to 5 ecoregions in its borders. Three logical divisions in file classification in Utah were made along the three major ecoregion borders. This allowed improvement in ancillary data applications in ecoregion classification and resulted in less spectral confusion and class overlap.
The three major Utah ecoregions used for file classification (FIG 2) were defined using two data sources, Omernik's digitized ecoregion boundaries and the initial prototype GAP state vegetation map (Ramsey 1993). Omernik's definition of Utah ecoregions include portions of the Wasatch and Uinta Mountains, Colorado Plateau, Northern Basin and Range, Southern Basin and Range and Wyoming Basin. However, the Southern Basin and Range and Wyoming Basin ecoregions have only limited occurrence in Utah. It appeared that three file divisions in the state would be sufficient, the Wasatch-Uinta mountains, the Northern Basin and Range to which the small portions of the Wyoming basin would be added and the Colorado Plateaus to which the small portion of the Southern Basin and Range would be added.
Because the Wasatch-Uinta mountain ecoregion divides the state down the middle, a definition of this ecoregion would essentially provide the boundary of all ecoregions in the state. This ecoregion is based primarily on occurrence of both deciduous and conifer trees. Tree classes of the initial GAP state vegetation map were used to define a more detailed boundary on the lower mountain flanks than Omernik's additional 1 to 2 km. buffer was included on all ecoregion boundaries to ensure complete coverage overlap when files are merged.
Digital classification of TM data is usually based on one of two fundamental methods, supervised and un-supervised. Each approach was used in classification of the Wasatch-Uinta portion of the state, while only unsupervised classification was used in the other two portions of the state. The approaches are described below.
A total of 112 training sites were used from 7 different data sets taken wasatch-uinta ecoregion wide representing 62 different cover-type classes. Polygons for each site were digitized on screen over raw TM imagery. This allows correct spatial placement of training polygons based upon "heads up" evaluation of TM imagery patterns using site coordinates generated from a Global Positioning System. Once training sites are in spatially correct polygons (i.e. the location of the site corresponds to the proper pixels in the TM data), spectral signatures are developed for each site. Site signatures were evaluated using a combination of the ERDAS programs Diverge (using transformed divergence) which computes the statistical difference between pairs of signatures and allows evaluation of the spectral distinctness of each signature, and Ellipse which visually graphs the spectral distribution of signatures by band. Following initial evaluation of training site signatures, similar signatures were grouped into common cover types. Training site signatures were evaluated using band means and standard deviations. The band mean value provides a measure of the uniqueness of the signature, and the standard deviation (sd) provides a measure of how liberal the classifying algorithm will assign pixels to the signature. A very small sd can result in errors of omission and a large sd results in errors of commission. In this approach errors of commission are the most important to avoided (errors of ommisiion are picked up by the subsequent unsupervised classification).
Overlap of signatures among like cover-types was allowed (eg pure spruce, spruce/fir mix), but overlap between different classes was not allowed. Ellipse evaluates the overlap of signatures based on standard deviation of the signature means. Signatures with a higher sd than 2 per band were eliminated. This threshhold served as a general rule, however exceptions were made for some more difficult classes. For example, conifer signatures needed to be much tighter than barren areas. Of the original 112 training points, 42 failed to meet the above criteria and were eliminated, leaving 70 training points spanning 29 cover-type classes.
Three different classifying algorithms were evaluated to determine the most successful for supervised classification of the signature means. Maximum likelihood (takes variability of classes into account by using the covariance matrix of the signature), mahalanobis distance (similar to maximum distance) and minimum distance (classifies based on the Euclidean distance between candidate pixel and the signature). The ERDAS program Cmatrix was used to build a contingency matrix for comparison of a training site derived signature and the percent pixels classified correctly in the site by each classifying algorithm. Successfully classified pixels from 34 training sites were summed into a single cumulative total revealing maximum likelihood with 2,766, mahalanobis distance with 2,666 and minimum distance with 2,177 correct pixels. Because maximum likelihood showed the highest success in classifying correctly it was used as the classifying algorithm for the 70 training sites.
Supervised classification of large data sets using many training sites can often result in a significant amount of commission error in the misclassification of pixels. The ERDAS program Threshold was used to identify pixels from each signature that were most likely mis-classified. When maximum likelihood was used to produce the initial classification an output file was created using a spectral distance equation to determine a probability value for each pixel. The higher the value, the further the spectral distance from the signature and the increased probability of misclassification. The histogram of the probability file follows a chi-square distribution, with the tail of the histogram representing pixels most likely to be incorrect. Threshold interactively displays the histogram for each class and allows user definition of the point at which pixels will be screened out. The shape of the histogram and the standard deviations of the training site signatures were used to define the threshold cut-off point. The bell shape of the chi-square curve can be used to visually define breaks for threshold cutoff points. It was determined most training site signatures were to broad in their initial pixel classifications and several successive thresholds at increasingly conservative levels were required to reach a reasonable classification. The ERDAS program Polycat was used to compare polygons of training site areas to the various threshold classification screenings to determine the optimum level of screening needed. Again, because commission errors are more serious than omission errors in a supervised classification, the final threshold screened classification was purposely on the conservative side.
The final threshold classification was still to broad in some cover-types, and ancillary data models were used within ERDAS Gismo to establish elevational and locational parameters on each training signature. b. Unsupervised classification
The ERDAS Isodata program was used to generate spectral signatures for the unsupervised classification. Isodata is a clustering program that begins with a user specified number of arbitrary cluster means and processes the data repetitively to define optimal cluster cores in the data. The best strategy in selecting the number of spectral classes to generate is to cluster tight enough to allow good cover-type association, without generating to many clusters which require additional time and expense in collecting ground-truthing data. The initial target of classes was based on four times the number of final cover-type classes expected to be generated. The results of the clustering were evaluated using the standard deviations of the signatures. They provide a guide to the statistical "tightness" of a signature. The target band mean standard deviation was approximately 3 for each band in each signature for lower variability areas such as the Wasatch-Uinta and 4 for higher variability areas such as the Colorado Plateau. This target standard deviation was usually the threshold where finer spectral partioning would be of little additional help for the classification.
In evaluating the spectrum of signatures created in an image files of this size there will always be signatures (water, shadow, etc) which represent non-typical outlying spectral areas which result in signatures with large standard deviations. These signatures are typically very difficult to significantly reduce their standard deviations by increasing overall class numbers because the majority of signatures are statistically in the middle of the distribution with the outliers in the tails, resulting in a proportional huge increase of classes to affect the outliers (i.e. distribuion usually follows a normal curve). Therefore, the majority clusters in the middle classes provide the most reliable information for comparison. There is usually a break point where additional partioning of spectral variability by increasing class numbers results in a point of diminishing returns in the standard deviations of the middle classes. Ideally, the total number of classes selected to cluster results in the optimization of this break point.
For ease of comparison standard deviations were summed across the bands in each signature to a single cumulative value. Multiple classifications (usually in increments of 25 classes) were evaluated to define the break point for optimizing partioning of the variability in each of the three ecoregion areas. A minimum distance classifying algorythmn was used to classify the data set. Minimum distance was selected because it approximates the algorithm used to create the initial statistical clusters and provides continuity in methodologies. Multiple clusters for each ecoregion were evaluated and selected based on the profile of the signature standard deviations.
Unsupervised classifications rely on ground training points to associate spectral classes with cover-types. Three primary sources of training data were used; 1) field collected Global Positioning System (GPS) field plot readings, 2) aerial photo interpreted plots, and 3) forest service and BLM sponsored field plots referenced by photographs and maps. GPS plots were collected state-wide by 3 different field crews sponsored by USU, USFS, Utah Division of Wildlife Resources and NPS. GPS plots were taken on various vegetation cover-types throughout the state. Roads were traveled to find various sites that represented both typical and non- typical cover-type canopies and compositions. Efforts were made to collect a representation of the diversity within a cover-type. Besides the UTM coordinates collected at each plot, descriptive information on vegetation species composition by canopy and height, physical characteristics, such as soil canopy and color, topographic characteristics such as slope, aspect and elevation and site characteristics such as uniformity of cover type, adjacency to road, and size were also collected. Using screen enhanced raw TM data, each UTM coordinate was photo interpreted in ERDAS digscrn to create the training polygon. This allowed evaluation of site characteristics (such as adjacency to a road or size of plot) to be used for maximum correct spatial placement.
USDA Forest Service Region 4 photo interpreters in cooperation with USU were used to interpret primarily forest sites throughout the state using low level true color aerial photographs in stereo pairs. A representative aerial photograph was found using flight path maps and orthophotoquads. Corresponding TM imagery using bands 4,3,2, in false color composite was displayed at full resolution in ERDAS for the photograph site. Vegetation cover-types were identified on the photograph and corresponding sites found on the screen displayed imagery. Erdas Digscrn was used to digitize a polygon around an identified training area on the image. Stero pair photographs were used to determine species composition, canopy, height and slope of vegetation on the plot. Either corresponding 7 1/2 minute quadrangle maps or polygons overlaid on DMA topographic data were used to determine elevation. Real time comparison of photographs next to image on screen proved very effective for creating training site polygons.
A program coordinated by David Born of the Forest Service Intermountain Research Station and Doug Meyers of Forest Service Region 4, had Forest Service and BLM field crews collect ground training points while doing other field duties. Typically field crews would identify a cover-type training site in the field and either mark the corresponding point on a map or a photo. Photo points were later transferred to maps. Field forms including information on vegetation, topography and site characteristics were included with the point. Maps, photos and forms were sent to USU, where a digitizer was used to get UTM coordinates for the point. Training site polygon boundaries were digitized using UTM coordinates and cover-type and site characteristics by screen interpretation on raw TM imagery using ERDAS digscrn.
Training polygons from all sources were overlaid on the classified map using ERDAS Summary to summarize which classified pixels are found in each polygon. Because training polygon size varied, pixels were weighted according to their percent of occurrence in the training site by multiplying the number of pixels by the percentage of occurrence in the plot (10 pixels in a 100 pixel site would carry a weighting of 10 (10 pixels * 1%) and 100 pixels in a 1000 pixel site would also carry a weighting of 10 (100 * .1%). This standardized the weighting of each pixel regardless of the size of the training polygon to eliminate class association bias based on training polygon size alone. All training plots and corresponding classified values were entered into the data management package Paradox for summary of vegetation, topography and spectral class association. Data were then imported into SASS for statistical summaries of spectral class data by cover-type.
Statistical summaries from training data were created for each of the spectral classes. For each class the weighted value and percent composition of each represented cover-type was presented from the training data. Because each training site had the potential to be represented by more than one spectral class (i.e. an aspen training polygon might contain spectral classes 45 and 56) , summing data by cover-type per spectral class allowed pixels from one training site to sum in multiple spectral classes. In complex multi cover-type spectral classes 90% was the established cut off for including cumulative cover-type percentages. More simplistic class-association patterns would include cumulative percentages higher than 90%. Cover-types were arranged and selected by class in proportion of occurrence. For example, in Wasatch-Uinta spectral class 37 oak was found in 78% of the plots, aspen in 10%, maple in 4% and sagebrush in 3%.
Vegetation cover-type definitions were based on the UNESCO classification scheme. Parameters for this scheme contain certain canopy and height requirements to fit into hierarchical categories. All training plots were examined to determine which UNESCO class they fit into by evaluating height and canopy parameters. Whenever possible vegetation cover-types were determined at the class level (sagebrush), but in some cases had to be stepped back to the formation level because the class level resolution wasn't available in the TM data (mountain shrubs).
After initial UNESCO class associations with training plots, summaries of spectral class association for each eco-region and training plots were compared to see the degree of association between spectral classes and UNESCO vegetation classes. Evaluations were made at this point to determine what could be separated as cover-type classes based on spectral association as well as availability of ancillary data to help discriminate cover-types.
Developing a cover-type map from Landsat TM data often requires more than simple association of spectral classes to vegetation cover-types, ancillary data is often used to further separate spectral confusion. For example, Wasatch-Uinta spectral class 38 includes significant associations of aspen, oak, willow, mt. mahogany and maple. To simply assign one of these cover-types to this spectral class would result in high commission error. The alternative is to employ ancillary data to allow multiple cover-type assignment of pixels occurring in a single spectral class. In this example elevation and aspect can be used to clarify locations of oak and aspen (aspen occurs higher in elevation than oak, and where they overlap they occur on different slopes), slope and elevation can be used to clarify willow and mt. mahogany (willow occurring on very mild slopes and mt mahogany on very steep slopes in certain elevations) and location can be used to further clarify cover-type assignment (oak does not occur in extreme northern Utah and maple does, oak occurs in different elevation zones based on the location in the wasatch mountains).
An extensive literature search was made including both formal and informal publications as well as personal communications to provide the ancillary parameters most effective for each cover- type. In addition, the topographic information from the field collected training points was input and summarized in SASS to provide a topographic profile of each cover-type based on field data.
Data used to support TM vegetation cover-type modeling include digital elevation data (3-arc second DTED) output to elevation, slope and aspect files, hydrology from USGS 1:100,000 dlg files (including streams and water bodies), land use including agriculture and urban areas and cover-type location polygons generated from literature descriptions (such as a Utah vegetation map generated from aerial photos and mapped at 1:500,000 (Foster)) field experience and various localized articles and maps throughout the state.
The ERDAS program Gismo was used to make Boolean logic models that incorporated multiple ancillary data sets in conditional statements to clarify spectral class/cover-type association overlap (FIG 3). Defining how to model each of the spectral classes with ancillary data was based on input from a variety of primary and secondary sources . The primary data input source was cover-type percentages per spectral plot. Training plots were analyzed both as a percent of the spectral class, as well as a percent of cover-type. Training plot quality was ranked based on reliability of original source and weighted accordingly. Secondary sources included topographic summaries of elevation, slope and aspect, regional summaries and spectral class characteristics. In addition, spatial distribution of spectral classes based on secondary comparisons served to help determine the extent of modeling required per spectral class.
In summary, each spectral class would typically be modeled using the primary source (cover- type percentages) to define the priority and extent of modeling needed by cover-type, and secondary sources (topographic, area and spectral summaries) to further define additional model parameters that would aid cover-type separation. For example, Wasatch-Uinta class 29 includes primary data for p/j, pp/ms and mtfir/ms. However, the secondary elevation summary shows class 29 pixels occur at higher elevations than any of the primary cover-types are expected to occur. Hence, based on the profile of the primary cover-types, s/f/ms is assigned to the pixels occurring at elevations above the primary types.
Model parameters were derived from literature sources, personal communication with knowledgeable people and field work data. Model parameters per operation per spectral class are referenced according to the primary citations used to justify the model parameters. Intense effort was made to ensure as much objectivity as possible in generating spectral class models. However, the nature of remote sensing requires some subjective decisions to be made based on spectral characteristics, data characteristics, and field knowledge to create cover-type maps.
Four of 7 TM bands were clipped for classification. Bands 2,3,4 and 5 were included, with bands one and seven not included. Based on the vegetative nature of the ecoregion it was decided these two bands would offer little unique information. Of the four bands, bands 4 and 5 have classic normal distributions, while histograms of bands 2 and 3 are more skewed to the left but generally approximate a normal distribution.
All urban and agricultural areas were masked out on the original image, allowing for classification of wildland cover-types only. The TM files were resampled by a factor of 2 to allow stitching into one complete ecoregion file for spectral grouping.
Standard deviations of three clustering results, one each of 100, 125 and 150 classes were compared for the wasatch-uinta region. For ease of comparison standard deviations were summed across the bands in each signature to a single cumulative value. The 100 cluster resulted in classes that were to broad (typical sum sd of 12-15), with the 150 cluster resulting in classes to fine (typical total sd of 7-8). The 125 class cluster had typical sd sums of 8-10 which was the original target goal, and additionally seemed to best define the break point for optimizing partioning of the variability.
A total of 667 training sites from 3 types of data sets were used to correlate spectral classes with vegetation cover-types (FIG. 4). 356 or 53% were collected using a GPS (Global Positioning System) in the field, 221 or 35% were collected using aerial photographs and 80 or 12% collected using photo interpreted field sites.
Three primary ancillary data sets were used in modeling. Ancillary topographic data including elevation in feet at 250 foot intervals, slope in degrees at 2 degree intervals and aspect in four cardinal directions. Ancillary location data including four major regions; wasatch north, wasatch south, uinta north slope and uinta south slope (FIG. 5a). Ancillary vegetation location data defining non-oak areas (FIG 5b).
This ecoregion classification is based on 150 spectral classes clustered from 5 Landsat TM bands (excluding band 1 and band 6) using ERDAS isodata. Clusters of 125, 150 and 175 classes were generated and signature standard deviations were analyzed, with the best spectral partitioning occurring in the 150 class cluster. The level of cover-type discrimination within the 150 spectral classes were based on evaluation of correspondence between spectral classes and plot data.
There are one primary and two secondary ground-truthing data sets for the Colorado Plateau. The primary data set is composed of a total of 484 points (FIG 4). 388 (80%) are GPS collected points, 59 (12%) are photo interpreted points and 37 (8%) are ground/photo interpreted points. One secondary data set is based on BLM SWVEN inventory data collected in the Henry Mountain Resource Area which was digitized to polygons and overlaid on the spectral classes. A total of 1780 polygons were used to overlay on the spectral data. The BLM data polygon data were considered a more broad based landscape truthing approach and were assumed to have a higher probability of containing error in the cover-type/spectral class associations. Hence, while helpful, the BLM data set was not given as much weight in defining cover-type models as the primary data set. The other secondary data set was 34 points collected in the Uinta Basin by GPS. These points were collected after the first two data sets had been summarized to fill some gaps in the training points locations.
Three primary ancillary data layers were incorporated in models to separate spectral confusion. Ancillary topographic data included elevation in feet at 100 feet intervals, slope in degrees using 31 classes at 2 degree intervals and aspect in four cardinal direction classes. Ancillary vegetation coverages include blackbrush at two locations; the Henry mountain higher elevation area and broad based lower elevation areas (FIG. 6a), and low elevation juniper to include stands of juniper around the Santa Clara river (FIG. 6b). Ancillary location data include the Uinta basin (FIG. 6c).
The Basin & Range was modeled using five Landsat TM bands. Band 1 and 6 were excluded. A total of three spectral clusters were done with 125, 150 and 175 classes. Based on the standard deviation of the signature means, the 150 spectral cluster was selected as the optimal class partition. The level of cover-type discrimination within the 150 spectral classes were based on evaluation of correspondence between spectral classes and plot data.
A total of 573 points were used for training sites in the Basin & Range (FIG. 4). 490 points (86%) were GPS based, 57 points (10%) were ground/photo based and 26 (4%) were photo interpreted based. Three primary ancillary data layers were incorporated in models to separate spectral confusion. Topographic ancillary data sets include elevation in feet at 100 foot intervals, slope in degrees in 31 classes at 2 degree intervals and aspect in four cardinal directions classes plus flat. Ancillary vegetation data sets include juniper and pinyon, oak and maple, greasewood and pickleweed. Ancillary location data includes the Oquirrh and Raft River mountains.
Agricultural and urban areas in Utah were identified using two sources. The primary source was data collected by the Utah Division of Water Resources as part of an on-going 1985 to present program to map water-related land use for the entire state. Aerial photography is collected in 35mm slide form during high contrast times of urban and agricultural growth at low elevations. Water related boundary lines are transferred from the slides to USFS 7 1/2 minute quad maps using a projector and then labeled according the water use. Quad maps are then taken to field for check of boundaries and land-use data. Following field verification, quad maps are then used as a digitizing base to convert polygons and labels to ARC/INFO.
Water Resources defined a variety of urban and agricultural classes. For gap analysis use we combined all the commercial, urban and residential codes into one urban category. We combined all irrigated cropland, row cropland, non-irrigated cropland, pasture and hayland classes into a single agricultural class. We obtained in 1991 all data completed as of 1990 (about 85% of the state). Two river-basin areas had not been completed yet, the Uinta Basin and the Western Colorado River. To complete the agricultural and urban coverage for the state, Landsat TM data and ancillary data were employed. Urban areas were initially identified from USGS 1:100,00 road dlg files using primary and secondary roads, intersected with digitized private and indian land polygons (urban areas would not be expected on federal or state lands). This provided a file with potential urban lands where high density roads occurred on private or indian lands. Potential urban area road zones were buffered within ARC/INFO at 250 meters to create urban polygons from overlapping road buffer zones. The potential urban polygon file was then further cleaned to remove arcs shorter than 400m which effectively removed outlier small non-urban areas. Potential urban areas were then hand edited and compared to maps as a final check. Final urban areas were buffered by 100 meters to allow for urban zones extending beyond roads.
Potential agricultural areas were defined by overlaying private and Indian land and TM data using bands 4,3 and 2 in a false color composite. Potential agricultural areas were grossly defined using the Erdas-ARC/INFO live link by location of polygons and intensity of red (representing green vegetation) areas within land-ownership polygons. To further refine agricultural areas within the gross boundaries, specific spectral classes from a state wide spectral classification were used. These classes were generated from an unsupervised 110 classification of the entire state of Utah. Cluster signatures were generated using Isodata in Erdas to identify 110 spectral signatures. Erdas Maxclas was then used to with a minimum distance algorithm to classify the 110 clusters in spectral classes. Based on field data, photo and image interpretation, 19 of 110 classes best represented agricultural areas. These 19 classes were reselected out of the gross boundaries to define agricultural areas. Areas were then scanned with a 3x3 filter in Erdas Scan to smooth the file and remove salt and pepper pixels.
Before classification of natural land-cover types from TM data, agricultural and urban data files were overlaid to mask out these areas on the TM data. Hand editing and screen interpretation was used to further edit any small exclusions or errors in agriculture or urban polygons to complete the coverage.
Wetland areas in the state were identified using on screen digitizing over screen enhanced TM data. Because of their inherent diversity, wetland areas are difficult to model using a digital machine classification approach. However, major wetland areas in the state can be identified from TM data using a photo interpretative approach.
Several data sets were used to aid in digitizing the wetlands. Boundaries of federal and state wildlife refuge boundaries, roads and BLM 1:100,000 maps were used to aid in selecting major wetland areas. Conversations with Utah DWR (Wes Johnson pers. comm.) were used to identify some specific areas. The BLM 1:000,000 quad maps contained marsh land designations which were also helpful in some areas. Wetland areas were identified using the various ancillary data sources and then using TM imagery to digitize in specific polygons. Imagery was displayed using bands 4,3, and 2 in a false color composite with marsh areas appearing in shades of red because of the strong vegetation component. Digitizing was done using the ERDAS Digscrn program. Imagery was collected in 1988 and 1989, a very high water year for the Great Salt Lake, resulting in sharply reduced wetland areas around the lake available for mapping.
Riparian areas can be very difficult to map with TM data because of their size, shape and vegetative complexity. However, they are crucial to many wildlife species which make them important areas to map. Utah riparian areas could not be mapped using the standard TM land cover-type methodology because the very narrow and complex riparian areas both spatially and spectrally create to much pixel confusion which results in mixed classes.
To address this problem in Utah ancillary data was employed to assist in the identification of riparian zones. USGS 1:100,000 hydrography DLG files were used to identify potential riparian zone areas. Three categories in these files were used, perennial streams, intermittent streams and shoreline areas. These ARC/INFO files were buffered into polygons according to which category of hydrography. Perennial streams were buffered to 45 meters on a side (90 total), intermittent streams buffered 30m to a side (60 total) and shoreline were buffered to 60 meters on a side (120 meters total). Each of these amounts can evenly be divided by our TM pixel resolution (30m). Buffering distances were determined by visually analyzing a variety of riparian areas on TM imagery.
These buffered ARC/INFO polygons were then generalized (usually less then 25m intervals) to allow simplification for import into Erdas .dig files for clipping. Because riparian areas typically are so narrow, spatial error in placing the clip files needs to be greatly minimized. To accomplish this, riparian Erdas clip files were overlaid on the raw TM data by 1:100,000 quad (the original subset piece) to check for spatial accuracy. If clip files were found to be spatially off, the header of the clip file was adjusted to provide the most accurate fit. 36 of 46 quads needed some spatial adjustment, however only 8 of the 36 quads that needed adjustment involved adjustment more than 60 meters (2 TM pixels). The remaining 8 quads were adjusted from 90 to 180 meters. Given the map resolution of the DLG data (1:100,000), and the spatial error in the TM data (1 pixel or 30 meters), even 180 meters is an acceptable adjustment level.
Once adjusted, Erdas clip files were used to clip out potential riparian areas from the Utah 110 prototype spectral classification (Ramsey). Each of the 110 spectral classes were evaluated using field data and image interpretation to determine which of the classes represented riparian vegetation. 18 of the 110 classes were chosen and reselected out of the potential clipped riparian file, to create the actual ripaiarian file. Once reselected out, the riparian file was then scanned with a 3x3 to eliminate single pixel salt and pepper in the image.
Erdas Gismo modeling was done on the riparian file using elevation data to create two riparian classes, a high elevation file (mountain riparian) and an low elevation file (lowland riparian). 5500 feet was the elevation threshold chosen to divide the two types because it typically divided the low elevation cottonwood dominated riparian types and high mountain willow/alder types.
In the Wasatch-Uinta ecoregion 60,348,794 pixels (54,314 sq. km.) were classified into 28 wildland vegetation and landform cover-type classes and two land- use classes. The supervised classification classified 5.7% of the wasatch-uinta file, leaving the remaining 94.3% of the pixels to be classified using an unsupervised classification.
In the Colorado Plateau ecoregion 90,374,192 pixels (81,335 sq. km.) were classified into 25 wildland vegetation cover-type classes.
In the Basin & Range ecoregion 85,037,847 (76,532 sq. km.) were classified into 27 wildland vegetation cover-type classes.
A total of 38 vegetation cover-type and land cover classes were identified for the Utah data set Cover-Type categories are listed by principle species which define the cover-type. Landscape scale cover-type mapping includes many prevalent primary associated species which can substantially occur as part of the cover-type in localized areas. This is not intended to be a complete species list, but rather an overview of the most common species associated with each cover-type. General descriptions of each cover-type are in bold type. Cover-types are listed in numeric order of codes as found in the digital data set.
The cover-type definitions is a list of the most common localized species that would be expected to be found in each class. Because the nature of image processing at such landscape scales provides for some localized generalization, possible locally prominent species are included. For example, in Utah limber pine is not found in pure stands at large enough extents to warrant being in a separate class, however in some localized conifer stands it can be prevalent as a co-dominant or associate to other conifers.
The state-wide classification file was created from stitching the three ecoregion vegetation coverages together. Each ecoregion file was first scanned with a 3x3 majority filter to reduce the "salt and pepper" of the file. Preference was given to the Basin & Range and Colorado Plateau when stitching in the overlapping zones of the files. Additional layers of agricultural, urban, wetland and riparian data were then overlaid on the state vegetation coverage, and allowed to overwrite any existing file data. Following completion of ecoregion modeling, output was scanned with a 3x3 filter to reduce pixel "salt and pepper". This results in a change in cover-type area statistics for each cover-type. Based on the charactatistics of each cover-type, typical shape (i.e. riparian), typical size (i.e stand size) or prevelance of occurance the scan will effect each diffently. However, a 3x3 filter at landscape mapping scales has minimul impact.
Approximately 11% (58 sq. km of 522 sq. km.) of the wetland class was modeled digitally in the Basin & Range, with the remainder done using the alternate wetland mapping methodology. Lowland riparian data was primarily modeled digitally (91%) with only 9% (88 sq. km.) done using the alternate riparian methodology. 57% of the mountain riparian data was modeled using the alternate riparian methodolgy (416 sq. km.), with 43% (311 sq. km.) modeled digitally.
National Gap Standards require GIS info to be represented in ARC/INFO polygon form with a minimum mapping unit of 100 ha. Because models were executed at the single pixel raster level, and output in Erdas raster file format, polygon conversion was required. Erdas files were imported into ARC/INFO as grid files using the USGS 1:100,000 quads as a tiling base. The size and complexity of the files (even at the 1:100,000 quad size) restricted polygonization at the single pixel level. To reduce the complexity of the grid files, ARC/INFO Regiongroup was used to define pixel linkage to identical pixel class groups in the grid files. Pixel groups larger than 1 hectares (11 pixels) were then reselected out of the original grid, while pixel groups smaller than 1 ha were subsumed into larger surrounding groups using ARC/INFO Nibble, resulting in a coverage containing only 1 ha and larger pixel groups.
The 1 ha minimum resolution grid file was polygonized into ARC/INFO at a precision of six decimal places. Some of the 1:100,000 quads had to be subset into smaller units due to ARC/INFO software limitations during the polygonization process. Following polygonization, coverages were generalized to the 100 ha mmu level. A buffer zone of 33.4 kilometers was created around each quad to account for edge polygons during the generalization process. A "smart eliminate" program written with a combination of Arc Macro and C language was used to subsume the smaller polygons into the larger polygons at various size intervals. Polygon areas were used in the program as the variable for targeting polygon elimination. Coverages were first eliminated to the 3 ha level, then 5 ha level and at then at 5 ha intervals until the 100 ha level was reached. It was found that eliminating at smaller area intervals resulted in a more accurate polygon boundary than eliminating at larger intervals. The program would sort the polygons by area and target those polygons which didn't meet the MMU requirements for the designated area interval. The program then analyzed the cover-type of the polygons surrounding the targeted polygons. The elimination of the targeted polygons into larger adjacent polygons was done by consulting a matrix which provided a weighting for all possible combinations of adjacent cover-types (APPENDIX H). Based on this weighting, the smaller polygon is subsumed into whichever adjacent polygon has the highest relative weighting. Using this approach similar cover-types can be given a higher relative weighting than dis-similar types, allowing the subsuming process to "prefer" merging similar cover-types. If weighting values were equal, the program merged the polygon to the adjacent polygon with the longest shared boundary.
The program allowed for some cover-types to be left out of the elimination procedure at whichever interval the operator chose. In this way sensitive cover-types such as riparian and wetland areas were preserved in the data set at a 40 ha MMU.