GAP Bulletin Number 5
June 1996

A Preliminary Comparison of MMU Aggregation Procedures for Raster Data

Many Gap Analysis projects are challenged by the need to aggregate their base resolution land cover data to the 40- to 100-hectare minimum mapping unit (MMU) land cover product. Two creative solutions to this challenge have been developed by the Utah and Montana Gap Analysis Projects. In November 1995, the Arkansas GAP Project decided to face this challenge by evaluating these methods, along with some locally-developed procedures. Unfortunately, we encountered software problems with the Utah product that could not be corrected before our project's deadline, so attention was focused on evaluating the Montana method versus locally derived procedures. It became clear that the assumptions underlying the Montana method paralleled the image processing procedures used by the Arkansas project, and it was ultimately selected for statewide use in Arkansas. It is hoped that a more comprehensive comparison that includes the Utah product can be made in the future, but lessons learned to date may still be valuable to other GAP projects.

Before software problems were encountered in the Utah code, the Arkansas GAP team implemented a variety of testing procedures to evaluate both methods. We first tested the "rastelimqueen" program from the Utah Gap Analysis Project (UT-GAP). Rastelimqueen required an input ASCII raster file, a similarity matrix, and a minimum number of pixels in a group. The input and output products were then processed using GRASS GIS software. The ASCII raster files in addition to the existing binary raster files used by the Utah method are very large and require substantial disk space. The data were output from the GRASS binaries to ASCII form and provided as input to the module. The test data were processed successfully by the module, and the resulting ASCII output file was transformed using a conversion shell script to re-transform the header data to the GRASS format. The resulting file was then read into GRASS with the "r.i n.ascii" module. This process was regularly interrupted by an error message which noted that the "data conversion failed at row 1027, column 1878." Although the line with the error could be extracted, the extreme length of the line prevented examination of column 1878, even using a variety of UNIX tools that allow processing of very long lines. Without being able to input the ASCII data back to GRASS, the rastelimqueen program could not be fully tested.

Concurrently, we tested the Montana method. An advantage of the Montana program was that it did not require ASCII import. Instead, a binary cell matrix was used for input. The amount of area that could be processed at one time was an important element of the Montana method and was influenced by the amount of available memory. The work was conducted on a multi-CPU Sparc system with 100 megabytes of random access memory that were allocated to the process out of a total of 320 (mb RAM!). The Montana program utilized four variables that affected memory requirements: (1) number of columns, (2) number of cells, (3) number of categories, and (4) number of output polygons from each aggregation pass. Locally developed interfaces reclassed only those categories which were present in the section (then restored the original category numbers at the end of the process), constructed GRASS supporting files, and did other miscellaneous tasks. To overcome the memory limitations, the state pixel map was divided into seven subsections. Interfaces were written to the Montana program to derive similarity matrices for the seven subsections of the Arkansas map. With these interfaces and 100 megabytes of available RAM, six of the seven subsections were processed in one day. Testing was necessary to ensure that parameters would not exceed memory requirements.

Aggregation levels were 2, 10, 40, and 100 hectares. On some of the wider (more columns) subsections, additional aggregations at the 60 and 80 hectare level were required to further reduce the number of polygons so that the available memory was not exceeded. With the available hardware, the Montana aggregating method was very fast (probably 25 lines/second). At each larger aggregation unit the program was slower than the previous level, which was expected. According to the Montana team, the program can be run with as little as 16 megabytes of RAM, but this would limit the area (or other parameters) considered in each run. Testing in such a situation would be necessary to determine the maximum allowable four inputs to keep from exceeding the 16 megabytes of RAM.

Both the Montana and Utah approaches used similarity indices for intelligent decision-based aggregation. Montana's matrix was formed on the basis of multispectral data. Utah's matrix was a user-defined map classification similarity index defining which mapped categories were most alike (ecologically). This methodological distinction is quite significant, though each matrix can provide acceptable results. Evaluating the actual results from these aggregation methods poses another difficult task. Remember that any clump of cells can be subsumed and its identity changed if it is not large enough to remain at the current aggregation level. For example, cells that are classified as "oak," if not large enough, could be aggregated with other cells into a larger polygon classed as "cedar."

In the Montana method, aggregation occurred on a similarity matrix derived from the underlying spectral values. Thus, pixel groups that do not meet the minimum size limit would be aggregated with adjacent cells that had the most similar spectral properties. In the Utah method, aggregation would occur on the assessment of "similarity" of botanical character. While at first blush the Utah method would seem superior, and it may very well be in some situations, it means that the accuracy of the classification of the spectral class to the information class is central to the success of the aggregation that takes such assignments as a "given." Both techniques permit the "reservation" of certain classes, so that they are not forced into adjacent classes. Water, for example, can be blocked from being aggregated with other classes.

The two techniques reflect quite different underlying assumptions, and it is likely that each can yield successful results but in different mapping strategies. Utah's suite of aggregation algorithms, for example, also included a vector-based aggregation method which is based on the information class assignment and not the underlying spectral class. This is by no means a comprehensive comparison and, while the Arkansas team is satisfied with the results of the Montana method, we have not been able to perform a comprehensive, direct comparison of the two. It is clear that the mechanics of data aggregation are complex and depend on underlying image processing, GIS mapping strategies, and the assumptions that are made about similarities and classification. It is likely that there is no single best method, and what may be most appropriate in one situation may not be in another. More work is needed before these two methods (and perhaps others) can be said to be compared fairly.

Richard Thompson, Robert Dzur, and W. Fredrick Limp
Center for Advanced Spatial Technologies
University of Arkansas, Fayetteville

GAP Homepage - Table of Contents