What Are Learning Classifier Systems And How Do They Work?

Machine learning isn't just neural networks.

It's also k-means, Principal Component Analysis, Support Vector Machines, Bayes, Decision Trees, Random Forests, Markov Models, ….

And there are Learning Classifier Systems (LCSs). Many articles don't even mention them.

LCSs are a system to create and improve IF … THEN … rules for a task.

Let's explore how they work.

Note for the LCS experts out there: in this post I'm solely talking about Michigan-Style LCSs. I may cover other LCS variants in future articles.

How does an LCS work?

Assume for a moment we have a working LCS, trained to tell us whether a number is even, and we want it to give us a result for the number 5. What happens?

We convert our input 5 into the format that our LCS understands. For simplicity, let's use binary: 101.
We give this input to the LCS.
The LCS has a set of IF … THEN … rules, called a population [P], which it learned in the training phase. It now checks all IF … conditions if they match 101. We call the rules that match our input the match set [M].
Since our match set [M] might contain rules with conflicting predictions, a voting takes place. Every rule has parameters that determine how many votes it has.
The LCS returns the prediction with the most votes.

Note: despite their name, Learning Classifier Systems are not restricted to classification problems. We can also use them for regression, function approximation, behavior modeling, and much more!

What is the Environment?

The environment is the entirety of all states (condition + result) the LCS can encounter.

As with all machine learning algorithms, it is crucial to develop a suitable model of the environment:

Consider how the inputs look like. How can you model the conditions of your LCS so you can easily describe them as configuration? Typical rules can be 001#1#0 ➡️ even for binary inputs, with the # being a wildcard. Or you have distinct descriptions, e.g. for your customers when building a recommendation system: you could use gender, age, shopping frequency, and category of the last bought item. A rule for this set could look like [#, 20-40, 2-5 per month, smartphones] ➡️ smartphone accessories.
Which results make sense in this case? Consider a game AI. Yes, the player can move their analog stick all 360°. But does your AI need this much freedom? Or is N, NE, E, SE, S, SW, W, NW enough? This step needs experience and trial-and-error.

What is an LCS population?

The rule set of an LCS is called its population. Every rule only covers a small part of the problem space.

Each rule has several properties:

The condition, typically consisting of a set of properties and some wildcards. This can be much more complex, depending on your environment. Some LCS even use neural networks as conditions!
The result. Depending on how you designed your outputs, this can be an action, a category, a function, ….
Its fitness. Often calculated from other parameters, like accuracy.
Its accuracy, which is the percentage of times the rule matched AND produced a correct result.
Its numerosity, describing the number of copies that exist. During the learning phase, a subsumption mechanism looks through the rules to see if one rule subsumes another, and in that case increases the numerosity and deletes the subsumed rule.
Other parameters like the age of the rule, its reward prediction, the accuracy of its reward prediction, and more, depending on the implementation details of the LCS.

Side note on accuracy: from an end-user perspective, we are only interested in the accuracy of the whole rule set, not individual rules. Individual rules have their accuracy and fitness parameters so the LCS can learn and improve its rule set.

How are LCS trained?

To train an LCS, we need several systems working together:

Covering, which creates a new rule when the population does not contain a matching rule for an input. The output of that new rule is the correct value, if available, or random otherwise.
Credit assignment, which updates the learning-related parameters of the rules in a match set [M] (its fitness and accuracy).
Subsumption, which checks the rule set for "duplicate rules": rules that have the same result, where the subsuming rule has the same or higher accuracy, and either (1) both rules have the same condition, or (2) where one rule has a broader condition that fully includes the other condition.
Rule discovery, a genetic algorithm that creates new rules from two rules from either [P] or [M]. It creates two new rules by randomly merging parts of the two rules (crossover), and then randomly mutating them. It returns parents and children to the population.
Deletion, which selects classifiers (rules) for deletion. It is commonly using roulette wheel selection where the probability of a rule being selected for deletion is inversely proportional to its fitness. Selecting a rule for deletion reduces its numerosity by one. If numerosity reaches zero, that rule is removed from the population.

The training stops when a certain condition is met, e.g. an iteration count.

The rule set now still contains some bad and inexperienced rules (e.g. rules that were recently created). We now apply rule compaction to remove them.

Conclusion

An LCS is simple. It's a set of IF … THEN … rules that is under pressure by a genetic algorithm. Nothing more.

The resulting rule set is much is easier to understand than the weighted graph of a neural network.

Of course, you can make LCS as complex as you want. But the core idea is beautifully simple.

In the upcoming articles in this series, we will further explore the applications of LCS, the available frameworks, and we'll implement our own LCS with Node.js.