Emping User Guide

Author: Hans van Thiel, April 2007
email: hthiel.char@zonnet.nl

1. Overview

1.1. What

Emping is a utility that derives heuristic rules from nominal data. Nominal data are qualitative and unordered, as in:

Class is actually an ordinal attribute, but when the order is disregarded, it is nominal.

Heuristic rules consist of attribute values (predicates) that together imply another attribute value. For example:

Color:green and Proposition 1:True is Class:B

Heuristic rules are purely empirical, with no foundation in a theory or model. Emping automatically derives all such rules from a table of facts. This result is called a reduced normal form. Such a result can be analized and simplified further, but this first version of emping only performs the reduction.

1.2. How

Emping reads a file in a comma seperated format (.csv) as produced by the Open Office Calc spreadsheet, and returns the result as a .csv file that can be read by OO Calc.

You start the utility from a terminal and provide the file name as a command line parameter. For example:

$ ./emping QuinLanFacts.csv

Emping then asks you to supply the name of the attribute that is to be the consequent of the rules, and the result is saved as a file with that attribute as its name and the .csv extension.

This file can then be loaded into OO Calc.

1.3. More

More about the principles on which emping is based can be found in the white paper, Deriving Heuristic Rules from Facts , which is included in the distribution (pdf).

2. Example

2.1. Step 1

Enter the data in Open Office Calc as shown:

As you can see, the table can have empty lines and does not have to start in the first column. But:

2.2. Step 2

Save the table in Text CSV format. Choose double quotes as the text delimiter (default). Whole numbers will be stored without delimiters, and emping will use them after checking if they are all digits (no negatives, no fractions).

2.3. Step 3

Open the terminal and type emping, followed by the filename of the table (including the path). You may have to precede the command with the directory, which contains the emping executable. For example, if it is in your working directory:

$ ./emping (followed by the file name)

2.4. Step 4

The program will now ask for the attribute which is to be predicted. This can be any one of the names in the table header. If you type ? you will see a list of the names.

2.5. Step 5

The program will have saved the result in a file with the attribute name and the .csv extension. This can then be loaded into OO Calc, as shown in the image.

3. Miscellaneous

The emping utility is written in Haskell, and has been developed and tested on the Fedora Core 6 Linux platform, using the Haskell tools which are available as FC6 packages.

(Potential) users will probably be somewhat wary, in particular if their data is critical. Keep in mind that emping derives the rules, which is the hard part. Checking the results for correctness is easy.

As stated above, this utility is by no means the end of the story (see the included wp). Work will continue, and I'm very interested in comments, ideas and crits from anyone. Thanks in advance.

Emping stands for empirical reasoning or the Indonesian snack with that name.