A Java implementation of the generalized matrix learning vector quantization, a prototype-based, supervised learning technique.
The project is a plug-in for the weka machine learning framework.
Using the weka package manager:
- download the leatest WEKA version
- download the GMLVQ plugin zip
- install and run the weka gui
- choose
Tools
-Package manager
- in the new window, click the
File/URL
button and locate the packaged GMLVQ downloaded before - restart WEKA
To run an analysis with GMLVQ go to the Explorer
, choose your data and after selecting the Classify
tab you
are able to choose
GMLVQ located in the functions
folder.
Generalized Matrix Learning Vector Quantization
Conventional LVQ was enhanced by a linear mapping rule described by an OmegaMatrix
- putting the M
in (GMLVQ). This matrix has a dimension of dataDimension x omegaDimension
. The omega dimension
can be set to 2...dataDimension
. Depending on the chosen omega dimension each data point and
prototype will be mapped (respectively linearly transformed) to an embedded data space. Within this
data space distance between data points and prototypes are computed and this information is used to
compose the update for each learning epoch. Setting the omega dimension to values significantly smaller
than the data dimension will drastically speed up the learning process. As mapping to the embedded
space of data points is still computationally expensive, we 'cache' these mappings. By invoking
DataPoint#getEmbeddedSpaceVector(OmegaMatrix)
one can retrieve the EmbeddedSpaceVector
for this
data point according to the specified mapping rule (provided by the OmegaMatrix
). Results are
directly link to the data points. So they are only calculated when absolutely necessary and previous
results can be recalled at any point. Subsequently, by calling
EmbeddedSpaceVector#getWinningInformation(List)
one can access the WinningInformation
linked to
each embedded space vector. These information include the distance to the closest prototype of the
same class as the considered data point as well as the distance to the closest prototype of a
different class. This information is crucial in composing the update of each epoch as well as for the
computation of CostFunction
s.
Generalized Matrix Learning Vector Quantization
Also GMLVQ is capable of generalization, meaning various CostFunction
rules can be used to guide the
learning process. Most notably, it is possible to evaluate the success of each epoch by consulting
the F-measure or precision-recall values which is especially important for problems with unbalanced
class distributions or for use cases where certain incorrect classifications (e.g. false-negatives)
could be critical.
Visualization
Another key feature is the possibility of tracking the influence of individual features within the
input data which contribute the most to the training process. This is realized by a lambda matrix
(defined as lambda = omega * omega'
). This matrix can be visualized and will contain the influence
of features to the classification on its principal axis. Other elements describe the correlation
between the corresponding features.
When using the GMLVQ plugin, please cite:
- Bittrich, S., Kaden, M., Leberecht, C., Kaiser, F., Villmann, T., & Labudde, D. (2019). Application of an interpretable classification model on Early Folding Residues during protein folding. BioData mining, 12(1), 1. link
Additional references on GMLVQ:
- Kaden, M., Lange, M., Nebel, D., Riedel, M., Geweniger, T., & Villmann, T. (2014). Aspects in classification learning-Review of recent developments in Learning Vector Quantization. Foundations of Computing and Decision Sciences, 39(2), 79-105. link
- Bunte, K., Schneider, P., Hammer, B., Schleif, F. M., Villmann, T., & Biehl, M. (2012). Limited rank matrix learning, discriminative dimension reduction and visualization. Neural Networks, 26, 159-173. link
- Kästner, M., Hammer, B., Biehl, M., & Villmann, T. (2012). Functional relevance learning in generalized learning vector quantization. Neurocomputing, 90, 85-95. link
- Hammer, B., & Villmann, T. (2002). Generalized relevance learning vector quantization. Neural Networks, 15(8-9), 1059-1068. link