GML BOLT v. 1.1 November 11, 2009 Download
GML BOLT v. 1.0 July 17, 2009 Download
Balanced On-line Learning Toolkit is an open-source library that contains a set of on-line classifier interfaces and their implementations.
On-line learning (also called data stream mining) is the task of learning from streaming data. It means that a classifier should be always able to classify some data, even if learning process has not been finished yet. Moreover, the time of single example handling should not significantly grow during the learning process. Some of toolbox classifiers can also efficiently learn from class-unbalanced streams.
GML BOLT contains the following basic interfaces:
- IClassifier -- the interface for a learned classifier that has been introduced in A.Vezhnevets' GML AdaBoost Toolbox
- IOnlineClassifier -- the interface for a learnable on-line classifier. It can be used before the learning process is finished
- IOnlineMulticlassClassifier -- the interface for a learnable on-line multiclass classifier.
These interfaces are implemented in the following classes:
- CVfdtClassifierAdapter -- the C++ wrapper for VFDT classifier by Pedro Domingos & Geoff Hulten from the VFML library. VFDT (Very Fast Decision Tree) is an effective method of on-line decision tree induction [Domingos00]. We've also added VFDT serialization.
- COnlineBagging -- the On-line Bagging ensemble learning method by Nikunj C. Oza [Oza05]. It is based on Poisson-random resampling emulation. See his papers for more details. Our implementation also provides an automatic class cost definition and applying for the case of unbalanced classes.
- COnlineRandomForest -- our modification of On-line Bagging that is similar to Leo Brieman's Random Forest [Breimann01] but is able to learn in on-line manner. It is actually On-line Bagging with randomized weak classifiers.
- COneVsAllORF -- the multiclass classifier built upon On-line Random Forest according to the 1 vs. all scheme.
For getting more detailed information see the library documentation in the "/docs" directory in the toolbox archive.
The library was developed and tested on Microsoft Windows XP + Microsoft Visual Studio 2005.
Jul 17, 2009 v.1.0
- The first toolbox release
- Olga Barinova
- Dr. Alexander Velizhev
Please, mail all comments, suggestions, problems and contributions to:
- Olga Barinova (firstname.lastname@example.org)
- Alexander Velizev (email@example.com)
- Roman Shapovalov(firstname.lastname@example.org)
[Domingos00] Pedro Domingos, Geoff Hulten. Mining high-speed data streams. In Proc. of ACM SIGKDD, 2000.
[Oza05] N.C.Oza. Online bagging and boosting. IEEE International Conf. on Systems, Man and Cybernetics, 2005.
[Breimann01] Leo Breiman. Random forests. In Machine learning, 2001.