GML Balanced On-line Learning Toolkit

Downloads

GML BOLT v. 1.1   November 17, 2009   Download

GML BOLT v. 1.0   July 17, 2009   Download

Introduction

Balanced On-line Learning Toolkit is an open-source library that contains a set of on-line classifier interfaces and their implementations.

On-line learning (also called data stream mining) is the task of learning from streaming data. It means that a classifier should be always able to classify some data, even if learning process has not been finished yet. Moreover, the time of single example handling should not significantly grow during the learning process. Some of toolbox classifiers can also efficiently learn from class-unbalanced streams.

Overview

GML BOLT contains the following basic interfaces:

  • IClassifier -- the interface for a learned classifier that has been introduced in A.Vezhnevets' GML AdaBoost Toolbox
  • IOnlineClassifier -- the interface for a learnable on-line classifier. It can be used before the learning process is finished
  • IOnlineMulticlassClassifier -- the interface for a learnable on-line multiclass classifier.

These interfaces are implemented in the following classes:

  • CVfdtClassifierAdapter -- the C++ wrapper for VFDT classifier by Pedro Domingos & Geoff Hulten from the VFML library. VFDT (Very Fast Decision Tree) is an effective method of on-line decision tree induction [Domingos00]. We've also added VFDT serialization.
  • COnlineBagging -- the On-line Bagging ensemble learning method by Nikunj C. Oza [Oza05]. It is based on Poisson-random resampling emulation. See his papers for more details. Our implementation also provides an automatic class cost definition and applying for the case of unbalanced classes.
  • COnlineRandomForest -- our modification of On-line Bagging that is similar to Leo Brieman's Random Forest [Breimann01] but is able to learn in on-line manner. It is actually On-line Bagging with randomized weak classifiers.
  • COneVsAllORF -- the multiclass classifier built upon On-line Random Forest according to the 1 vs. all scheme.

For getting more detailed information see the library documentation in the "/docs" archive.

The library was developed on Microsoft Windows XP + Microsoft Visual Studio 2005 and also tested with Gentoo linux and GCC 4.1.2. The library is distributed under the BSD license.

Changes log

November 17, 2009 v.1.1

  • The library is ported to linux. The licensing scheme is changed to more liberal one.

Jul 17, 2009 v.1.0

  • The first toolbox release

Project team

Principal researcher:

  • Dr. Anton Konushin

 Lead researchers:

  • Olga Barinova
  • Dr. Alexander Velizhev

Researcher:

  • Roman Shapovalov

Contacts

Please, mail all comments, suggestions, problems and contributions to:

  • Olga Barinova (obarinova@graphics.cs.msu.ru)
  • Alexander Velizhev (avelizhev@graphics.cs.msu.ru)
  • Roman Shapovalov(shapovalov@graphics.cs.msu.ru)

References

[Domingos00] Pedro Domingos, Geoff Hulten. Mining high-speed data streams. In Proc. of ACM SIGKDD, 2000.
[Oza05] N.C.Oza. Online bagging and boosting. IEEE International Conf. on Systems, Man and Cybernetics, 2005.
[Breimann01] Leo Breiman. Random forests. In Machine learning, 2001.