Software packages and datasets available for download.

ALPA

ALPA is a technique that can render any of your black-box models into comprehensible white-box models. The generated models explain the classifications made by your black-box model and can improve upon traditional rule induction techniques as well.

21/10/2013 Added a tutorial and a new revision.
15/10/2012 WEKA regression rule extraction module is now available!

This software package is developed and maintained by enric.

ALPAR.zip

Antminer+

AntMiner+ is a classification technique which is based on the principles of Ant Colony Optimization. The goal is to infer comprehensible rule-based classification models from a data set.

The AntMiner+ implementation is based on description in Martens et al. (2007). A modification was made to the rule evaluation function, see Minnaert et al. (2012) for more details. Please reference the website as well as these two papers. Results of your experiments with Antminer+ will also be added to the website on request. Installation and running instructions are detailed in the README.txt file.

This software package is developed and maintained by bart.

AntMiner+ v1.0.rar

Big Bayes

Big Bayes is a special naive Bayes variant based on the Bernouilli event model, tailored for very big, highly sparse datasets. It was first introduced in this paper and can handle datasets with millions of instances and attributes. To access the software, please fill in the terms of agreement form below and send it to enric.

This software package is developed and maintained by enric.

AgreementForm.pdf

Data for Software Fault Prediction

Android data sets used for software fault prediction and extracted within the scope of the paper: "Comprehensible Software Fault and Effort Prediction: a Data Mining Approach".

android.rar

EDC

Document classification has widespread applications, such as with web pages for advertising, emails for legal discovery, blog entries for sentiment analysis, and many more. Previous approaches to gain insight into black-box models do not deal well with high-dimensional data. With EDC, we define a new sort of explanation, tailored to the business needs of document classification and able to cope with the associated technical constraints

This software package is developed and maintained by david.

EDCMatlab.zip

Faster ROC-AUC (matlab)

Calculates the Area under the ROC curve (AUC) associated with a binary classification problem. Main advantages of using this function over perfcurve are:

  • Speed: On a benchmark of 20 million instances this function performed more than 100 times faster than perfcurve (Matlab statistics toolbox).
  • Independence: Works without needing to install the statistics toolbox.

The package can be downloaded from: Matlab File-Exchange.

This software package is developed and maintained by enric.

ICBD (Imbalanced Classification for Behaviour Data)

This toolbox provides implementations (Matlab), results and datasets accompanying the paper “Imbalanced classification in sparse and large behaviour datasets”. Behaviour data reflect fine-grained behaviours of individuals or organisations and are characterized by sparseness and very large dimensions. Traditional studies dealing with the imbalanced learning issue operate on low-dimensional and dense datasets, which have a different structure and properties as opposed to the type of data under consideration. Imbalanced behaviour data occur naturally across a wide range of applications, some examples include: online advertising, fraud detection, churn prediction, default prediction, predictive policing.

This software package is developed and maintained by jellis.

ICBD.zip