Table of Contents
Introduction
A teacher of mine tasked me with the challenge of building a genetic algorithm to solve diophantine equations. Before I jump into it, I will first attempt to build a simple genetic algorithm which just evolves in a very stupid manner. I am using a certain tutorial1 for that.
The goal here is to have a minimal environment in which 32-bit numbers keep
evolving, until they reach twenty 1
's on their binary composition.
This GA is built using Common Lisp, due to my distaste for Java. If you're allergic to parenthesis, you've been warned.
ACHTUNG!
Since this post is the reproduction of following a tutorial and also my first attempt at producing a genetic algorithm, some or all of the things here might be inefficient or terribly wrong. I apologize in advance for the inconvenience; use this as didatic resource at your own risk.
This code is distributed under the MIT License. Source code for the original
file (org-mode
; better viewed inside Emacs) can be seen here, at my GitHub.
Initial population
Population is stored in a list. Unfortunately, this means that checking if a
chromosome is in the list is O(n)
speed at worst-case scenario (which is also a
recurrent thing). I'll leave optimization heuristics for later, though.
(defparameter *population* nil) (defparameter *fittest* nil) (defparameter *max-fitness* 20) (defparameter *rnd-state* (make-random-state))
Each individual on *population*
is a different chromosome. The genes
themselves will be the bits of each number. The two numbers of large fitness are
stored in *fittest*
. *max-fitness*
tells us the least number of 1
's we're
expecting. And then, we have an initial *rnd-state*
for randomization purposes.
Proto-chromosomes
Having a 32-bit number as chromosome also means that we need to look at the 16 least significant bits at crossover phase, but the 16 most significant bits can be randomly generated. For that, we use the global random state which will provide our random numbers.
(defun make-proto () (let ((chromosome 0)) (declare ((unsigned-byte 32) chromosome)) (dotimes (i 16) ;; Run d20 < 5 to see if bit should be set (setf chromosome (logior (ash chromosome 1) (if (< (random 20 *rnd-state*) 5) 1 0)))) (ash chromosome 16)))
I'm calling here a chromosome in a not-ready state a proto, since it is still
expecting a crossover phase. Each proto's significant genes have a chance of
3/10 (or 6/20) to be a 1
.
Printing any chromosome, be it a proto or not, is rather easy. We just instruct
the format
function to pad it with 32 positions, and fill the nonexistant
leading positions with zeroes. We can even build a function for that:
(defun format-chromosome (chromosome) (declare ((unsigned-byte 32) chromosome)) (format nil "~32,'0b" chromosome)) (defun print-chromosome (chromosome) (princ (format-chromosome chromosome)) (terpri))
Example: Checking zygote generation
Here is an example of what happens when you print an immediately created
proto. The first line is the actual value of the 32-bit integer, and the second
line is the binary representation of such integer, when you call
print-chromosome
using its value.
270663680 00010000001000100000000000000000
Generating initial chromosomes
We generate our first ten chromosomes in a very simple way. We just generate them from zygotes and put them on the list. We also use a helper macro which will only push a new chromosome onto the list if it is not already there.
(defparameter *distinction-threshold* (truncate (* (expt 2 32) 0.0001))) (defun recalculate-fittest (chromosome fitness) (cond ((or (null *fittest*) (< (length *fittest*) 2)) (push chromosome *fittest*)) ((> fitness (calc-fitness (car *fittest*))) (setf *fittest* (list chromosome (car *fittest*)))) ((> fitness (calc-fitness (cadr *fittest*))) (setf *fittest* (list (car *fittest*) chromosome))))) (defmacro with-distinct-chromosome (&body generator-thunk) (let ((chr-sym (gensym)) (repets (gensym))) `(let ((,chr-sym nil) (,repets 0)) (loop while (or (eq ,chr-sym nil) (member ,chr-sym *population*)) do (setf ,chr-sym (progn ,@generator-thunk) ,repets (1+ ,repets)) never (>= ,repets *distinction-threshold*)) (unless (>= ,repets *distinction-threshold*) (push ,chr-sym *population*) (recalculate-fittest ,chr-sym (calc-fitness ,chr-sym))))))
The macro with-distinct-chromosome
takes a generator thunk and performs it over
and over again, until the generated chromosome is not a member of the population
anymore. Since this behaviour may cause an infinite loop (or a seemingly
infinite loop in case we get stuck for a while), we define a distinction
threshold for this repetition.
In case we end up stuck in what could be an infinite loop, our macro halts the chromosome generation, and gives up on adding the new chromosome to the population. Since this macro is so useful on making sure we are not testing again any chromosomes which we've tested on the near past, we also use it later, on the crossover phase.
It is deduced that the chromosome generation is in an infinite loop if the repetition was performed for roughly a number of 0.01% of the distinct chromosomes we could generate (about 429496 numbers). We do not make distinctions between different numbers to count the amount of repetitions, though.
The function recalculate-fittest
compares the fitness of the newly-generated
number. We always need to have at least two fittest chromosomes. For that, when
we identify a chromosome which could be added to the list, we compare if its
fitness is greater than one of the two most-fit chromosomes. If that is true,
said chromosome will be replaced, and the newly generated chromosome will occupy
its proper space. Fittest chromosomes are arranged in such a way that the
fittest of the couple remains on top; which we'll discuss soon enough. This
represents a phase called selection; more on it will be said later.
(defun make-chromosome () (logior (make-proto) (ash 1 (random 16 *rnd-state*)))) (defun initialize-population () (setf *population* nil *fittest* nil) (dotimes (i 10) (with-distinct-chromosome (make-chromosome))))
The initialize
function makes good use of said macro, by generating ten distinct
chromosomes: it takes a proto, then sets a random flag at one of the 16 least
significant bits to 1, ensuring that we'll most likely have ten different
bitmasks at the beginning.
Computing fitness
Our fitness is calculated by the amount of 1
's in the binary representation of
our number. The more the amount, the better.
We define two functions: one is a predicate which, given the index of a bit,
returns t
if the bit is set to 1
. The other one loops through every possible bit
and counts how many of them are 1
's.
(defun bit-set-p (bitmask bit-index) (declare ((unsigned-byte 32) bitmask) ((unsigned-byte 8) bit-index)) (if (<= bit-index 31) (not (= (logand bitmask (ash 1 bit-index)) 0)) nil)) (defun calc-fitness (chromosome) (loop for x below 32 when (bit-set-p chromosome x) sum 1))
We also add a convenient function for printing the chromosome with its fitness.
(defun print-chromosome-with-fitness (chrm) (format t "~a => ~a~&" (format-chromosome chrm) (calc-fitness chrm)))
Example: Checking generated population
We can use mapc
along with print-chromosome-with-fitness
to beautifully print
all of our generated specimen, and each chromosome's fitness. Here is an
example.
;; Initialize a population first (initialize-population) (mapc #'print-chromosome-with-fitness *population*)
00000001010000010000000000000100 => 4 01000000100100010000000000100000 => 5 00000010110100000000000010000000 => 5 00000000000101010000100000000000 => 4 00010110001000010000000100000000 => 6 00001010001000010001000000000000 => 5 10010001100000100000100000000000 => 6 00000011110010000000001000000000 => 6 01011000110100000000000100000000 => 7 01010000001011110000000001000000 => 8
Selection
The selection phase is where the two fittest individuals are selected to pass on
their genes to the next generation. This phase is automagically done on the
with-distinct-chromosome
macro.
Example: Checking selected chromosomes
Using the same principle of printing the population, we can do the same with
*fittest*
, which is the variable storing the two fittest individuals. This time,
though, their fitness values are irrelevant, so passing print-chromosome
over
each one with the aid of mapc
should do the trick.
(mapc #'print-chromosome *fittest*)
01111111111101100001010010011111 01111101111101110001010010001011
Crossover (Breeding)
The crossover process produces two new chromosomes. We generate a new individual by making the parents exchange their least significant bits. Each new value is, then, added to the population, and the selection phase takes place.
(defun breed (parent1 parent2) (declare ((unsigned-byte 32) parent1 parent2)) (labels ((swap-bit (n) (let ((bit-p1 (bit-set-p parent1 n)) (bit-p2 (bit-set-p parent2 n))) (when (not (eq bit-p1 bit-p2)) (setf parent1 (logxor parent1 (ash 1 n)) parent2 (logxor parent2 (ash 1 n))))))) (dotimes (i 16) (swap-bit i))) (values parent1 parent2))
We also define a function to ensure our population never surpasses its fixed size. If it does, then the individuals of least fitness are removed.
(defun limit-population (max-num) (let* ((pop-len (length *population*)) (exceed (- pop-len max-num))) (when (> exceed 0) (let ((pop-fitness (mapcar #'calc-fitness *population*))) (labels ((remove-minimum () (let ((min-index (loop for i below pop-len for elt in pop-fitness with min = (cons 32 nil) when (< elt (car min)) do (setf min (cons elt i)) finally (return (cdr min))))) (when min-index (setf *population* (loop for elt in *population* for i from 0 unless (= i min-index) collect elt) pop-len (1- pop-len)))))) (dotimes (i exceed) (remove-minimum)))))))
Mutation
To ensure population diversity and remove the chances of early convergence, we mutate some genes at a low rate. By picking a maximum of 16 genes, regardless of significancy, at random, it is possible to roll the dice again (with little probability): should the odds be on favor, the specified gene will suffer a "flip".
(defun mutate (chromosome) (declare ((unsigned-byte 32) chromosome)) (let* ((num-mutations (random 16 *rnd-state*)) (mutating-indexes (remove-duplicates (loop for i below num-mutations collect (random 32 *rnd-state*))))) (labels ((attempt-mutation-at (n) ;; Throw d20 and expect < 8 (when (< (random 20 *rnd-state*) 7) (setf chromosome (logxor chromosome (ash 1 n)))))) (mapc #'attempt-mutation-at mutating-indexes))) chromosome)
Example: Testing the mutation algorithm
We can check whether this is working or not with a simple algorithm which generates a chromosome and its mutation; then, we just print them and see if they can be compared. By re-running it a couple of times, we should see a mutated gene here and there.
(let ((chromosome (make-chromosome))) (mapc #'print-chromosome (list chromosome (mutate chromosome))))
00110010000000000000001000000000 00110010000000001000000001000000
Crossover (Finalization)
Now we define our actual crossover function. The crossover function takes into consideration both the breeding process and the mutation.
(defun crossover () (multiple-value-bind (offspring1 offspring2) (apply #'breed *fittest*) (with-distinct-chromosome (mutate offspring1)) (with-distinct-chromosome (mutate offspring2))))
Convergence
We need a strategy to check if our population converged. For a genetic algorithm, a converged population usually means that we've reached a max fitness. Therefore, we just need to check if our fittest individual has a fitness greater or equal than the specified max fitness.
(defun population-converged-p () (>= (calc-fitness (car *fittest*)) *max-fitness*))
Debriefing and running the genetic algorithm
Now we can create a proper loop which will execute the steps of our algorithm until the population converges.
(defun debriefing () (princ "Population converged.") (terpri) (format t "Best fitness: ~a~%~%" (calc-fitness (car *fittest*))) (princ "Final population:") (terpri) (mapc #'print-chromosome-with-fitness *population*) (terpri)) (defun run-genetic-algorithm () (initialize-population) (loop until (population-converged-p) for i from 0 do (format t "Generation: ~a~%Fittest:~%" i) (mapc #'print-chromosome *fittest*) (terpri) (crossover) (limit-population 10)) (debriefing))
Conclusion
With everything all sorted and done, we can test our algorithm at once. Here's
one possible output when running the function run-genetic-algorithm
:
Generation: 0 Fittest: 00010010010011110001000000000000 01011010010100010000000000001000 Generation: 1 Fittest: 00010010010011110001000000000000 01011010010100010000000000001000 Generation: 2 Fittest: 01011110100100010001000101000000 00010010010011110001000000000000 Generation: 3 Fittest: 00011010011011110001000101000000 01011110100100010001000101000000 Generation: 4 Fittest: 00111010010111110000001101000000 00011010011011110001000101000000 Generation: 5 Fittest: 00111010010111110000001101000000 00011010011011110001000101000000 Generation: 6 Fittest: 00111010010111110000001101000000 00111010010111110001000101000000 Generation: 7 Fittest: 10111010010111110001100101000000 00111010010111111000001101100000 Generation: 8 Fittest: 10111010010111110001100101000000 00111010010111111000001101100000 Generation: 9 Fittest: 10111010010111110001100101000000 00111010010111111000001101100000 Generation: 10 Fittest: 10111010010111110001100101000000 00111010010111111000001101100000 Generation: 11 Fittest: 10110010010111111010101101101001 10111010010111110001100101000000 Population converged. Best fitness: 22 Final population: 10111110010111111010101101101011 => 22 10110010010111110001100101000000 => 14 10110010010111111010101101101001 => 19 00011010010111110001000101000100 => 13 00111010010111110100001001100000 => 14 00111010010111111000001101100000 => 15 10111010010111110001100101000000 => 15 10011000011011110000001101001000 => 13 00111010010111110001000101000000 => 13 00111010010111110000001101000000 => 13