Oral Defense of Doctoral Dissertation – Computational Sciences and Informatics – Probabilistic Topic Modeling for Hyperspectral Image Classification – Thomas P. Boggs
Notice and Invitation
Oral Defense of Doctoral Dissertation
Doctor of Philosophy in Computational Sciences and Informatics
Department of Computational and Data Sciences
College of Science
George Mason University
Thomas P. Boggs
Master of Science, George Mason University, 2002
Master of Science, Virginia Tech, 1994
Bachelor of Arts, Virginia Tech, 1992
Bachelor of Science, Virginia Tech, 1991
Probabilistic Topic Modeling for Hyperspectral Image Classification
Monday, April 22, 2019, 3:00 p.m.
Exploratory Hall, Room 3301
All are invited to attend.
Jason Kinser, Chair
Probabilistic Topic Models are a family of mathematical models used primarily to identify latent topics in large collections of text documents. This research adapts the topic modeling approach to the unsupervised classification of hyperspectral images. By considering image pixels similarly to text documents and quantizing data for each spectral band to develop a spectral feature vocabulary, it is demonstrated that by using Latent Dirichlet Allocation with a hyperspectral image corpus, learned topics can be used to produce unsupervised classification results that often match ground truth better than the commonly used k-means algorithm. The
topic modeling approach developed is demonstrated to easily extend to classification of image regions by aggregating spectral features over spatial windows. The region-based document models are shown to account for the spectral covariance and heterogeneity of ground-cover classes, resulting in similarity to land use ground truth that increases monotonically with window size.
Multiresolution wavelet decompositions of pixel reflectance spectra are used to develop a novel feature vocabulary that more naturally aligns with material absorption and reflectance features, further improving classification results. The wavelet-based document modeling approach is evaluated against synthetic image data, a small AVIRIS image with 16 ground truth classes, and finally on practical-sized, overlapping AVIRIS and Hyperion images to demonstrate the utility of the models. Multiple wavelet bases and numbers of quantization levels are considered and for the data sets evaluated, it is determined that using the Haar wavelet with 10 quantization levels yields the best performance, while also producing easily interpretable topics. It is demonstrated that by omitting low-level wavelet coefficients, vocabulary size and model inference time can be significantly reduced without loss of accuracy.
The wavelet-based approach is extended by replacing quantization levels with simple thresholds for positive and negative wavelet coefficients, reducing the vocabulary size to two times the number of wavelet coefficients. The thresholded wavelet model provides accuracy comparable to the quantized wavelet model, while having significantly shorter inference time and supporting easily interpretable visualization of topics in the wavelet domain. By establishing appropriate model hyperparameters and omitting low-level wavelet
coefficients, the thresholded wavelet model provides better unsupervised classification results than previously developed quantized band models, has shorter model parameter estimation time, and has an average document word count smaller by a factor of 5 and a vocabulary smaller by a factor of 10