SIMPLE GENETIC ALGORITHM IMPLEMENTATION IN VB.NET AND IT'S APPLICATION TO LINEAR LEAST SQUARES APPROXIMATION PROBLEM.

Genetic algorithm is widely used for complex optimization problems solving. It is based on idea of natural selection. I'll not spend time describing theory. Just follow links below:

When I became interested in GA (genetic algorithm) I spent some time looking for detailed description o GA. In presented here article I follow simple genetic algorithm given in Melanie Mitchell book "An Introduction to Genetic Algorithms". I present this algorithm implementation in VB.NET and widely known optimization problem solution using this implementation. Let me start from algorithm pseudocode:

Input:
N - size of initial population;
Pc - probability of crossover;
Pm - probability of mutation;

Pseudocode of Simple Genetic Algorithm

 population = GenerateInitialPopulation(N) Loop: Calculate fitness for each chromosome in population. Repeat until size(newpopulation) < N { Select a pair of parents from the current population according to their fitness (larger fitness - larger selection probability). Selected parents are not removed from the current population. With probability Pc apply crossover operation to form two offsprings. if no crossover take place - form two offsprings that are copies of their parents. Mutate the two offsprings at each locus (element of chromosome) with probability Pm. Place the resulting offsprings in the new population } If N is odd - remove random chromosome. Replace the current population with the new population. Goto Loop.

I've implemented this algorithm with one but really important change - the best object always survive (so called elitism).

Chromosome representation

We'll start with some abstract chromosome representation. All we basically need from chromosome - is ability:

 - to reproduce itself (make exact copy) - to perform crossover operation with other chromosome - to mutate at some point (locus)

One of possible way to declare such requirements is to use interface.

 ```Namespace GeneticAlgorithm Public Interface IChromosome Sub RandomInit() Sub CrossOver(ByVal mother As IChromosome, ByVal offspring1 As IChromosome, ByVal offspring2 As IChromosome) Sub Copy(ByVal dest As IChromosome) Sub MutateAt(ByVal locus As Integer) Function ValidLocus(ByVal locus As Integer) As Boolean End Interface End Namespace```

Fitness evaluation function intentionally is not included into IChromosome interface. The best explanation for this is presented below (in LSQ problem example). This is because in some cases chromosome fitness value depends not only on chromosome itself. So fitness evaluation function is included into BaseGeneticAlgorithm class. In cases when it's possible to evaluate fitness processing only chromosome object data we simply delegate this work to chromosome object itself.

The next step is abstract genetic algorithm implementation. We need this implementation to be flexible as possible (to be more exact - as we can do that). So all we need from GA implementation is:

 - to make all GA stuff - to be able to create chromosomes (factory) - to be able to compute chromosome fitness. (in some cases GA may delegate this function to chromosome itself but in certain cases it is impossible).
 ```' ' BaseGA.vb - Base class for simple genetic algorithm implementation. ' ' Project: Framework ' Author: S.Zabinskis ' December, 2006 ' Imports System.Math Imports System.Collections.Generic Namespace GeneticAlgorithm Public MustInherit Class BaseGeneticAlgorithm(Of T As IChromosome) #Region "Local Types" Public Class ValuedObject(Of X) Public _value As Double Public _rndval As Double Public _object As X Public Sub New(ByVal obj As X, ByVal value As Double) _object = obj _value = value End Sub End Class Private Class Mixer : Implements IComparer(Of ValuedObject(Of T)) Public Function Compare(ByVal x As ValuedObject(Of T), ByVal y As ValuedObject(Of T)) As Integer Implements System.Collections.Generic.IComparer(Of ValuedObject(Of T)).Compare If x._rndval = y._rndval Then Return 0 End If Return IIf(x._rndval > y._rndval, 1, -1) End Function End Class #End Region #Region "Local Variables" Protected Shared _random As Random = New Random((DateTime.Now.Hour * 3600 + DateTime.Now.Minute * 60 + DateTime.Now.Second) * 1000 + DateTime.Now.Millisecond) Protected _population As New List(Of ValuedObject(Of T)) Protected _iteration As Integer = 0 #End Region #Region "Abstract methods" Public MustOverride Function EvalFitness(ByVal obj As T) As Double Protected MustOverride Function CanStop(ByVal iteration As Integer, ByVal population As List(Of T)) As Boolean Protected MustOverride Function PerformMutate() As Boolean Protected MustOverride Function PerformCrossover() As Boolean Public MustOverride Function CreateChromosome() As T #End Region #Region "Methods" Protected Function GetFitness(ByVal obj As T) As Double For Each vo As ValuedObject(Of T) In _population If vo._object.Equals(obj) Then Return vo._value End If Next Return EvalFitness(obj) End Function Private Function NormalizePopulation(ByVal population As IEnumerable(Of T), ByVal reverse_flag As Boolean) As Boolean Dim minftn As Double = Double.MaxValue Dim maxftn As Double = Double.MinValue Dim sumftn As Double = 0 Dim fitness As Double = Double.NaN Dim N As Integer = 0 _population.Clear() For Each obj As T In population If reverse_flag Then fitness = -EvalFitness(obj) Else fitness = EvalFitness(obj) End If If maxftn < fitness Then maxftn = fitness End If If minftn > fitness Then minftn = fitness End If sumftn += fitness Dim vo As New ValuedObject(Of T)(obj, fitness) vo._rndval = BaseGeneticAlgorithm(Of T)._random.NextDouble() _population.Add(vo) N += 1 Next sumftn -= minftn * N Dim normf As Double = 1.0 / sumftn For Each vo As ValuedObject(Of T) In _population vo._value = (vo._value - minftn) * normf Next 'Debug.WriteLine("end of NormalizePopulation") Return N > 1 End Function Protected Overridable Sub ShufflePopulation() _population.Sort(New Mixer()) End Sub Private Function Pick(ByVal population As List(Of ValuedObject(Of T))) As T Dim p As Double = _random.NextDouble Dim sum As Double = 0 Dim o As T = Nothing For Each obj As ValuedObject(Of T) In population sum += obj._value If sum >= p Then Return obj._object End If o = obj._object Next Return o End Function Protected Sub Mutate(ByVal obj As T) Dim locus As Integer = 0 While obj.ValidLocus(locus) If PerformMutate() Then obj.MutateAt(locus) End If locus += 1 End While End Sub Private Function PickBest(ByVal population As List(Of ValuedObject(Of T))) As T Dim maxf As Double = Double.MinValue Dim o As T = Nothing For Each obj As ValuedObject(Of T) In population If obj._value > maxf Then maxf = obj._value o = obj._object End If Next Return o End Function Private Function NextPopulation(ByVal reverse_flag As Boolean) As List(Of T) Dim newpopulation As New List(Of T) Dim N As Integer = _population.Count Dim counter As Integer = 0 ShufflePopulation() ' the best survives always Dim bestObj As T = PickBest(_population) newpopulation.Add(bestObj) Dim offspring1 As T = Nothing Dim offspring2 As T = Nothing While counter < N Dim parent1 As T = Pick(_population) Dim parent2 As T = Pick(_population) While parent1.Equals(parent2) parent2 = Pick(_population) End While offspring1 = CreateChromosome() offspring2 = CreateChromosome() If PerformCrossover() Then parent1.CrossOver(parent2, offspring1, offspring2) Else parent1.Copy(offspring1) parent1.Copy(offspring2) End If Mutate(offspring1) Mutate(offspring2) newpopulation.Add(offspring1) newpopulation.Add(offspring2) counter += 2 While counter > N newpopulation.RemoveAt(counter - 1) counter -= 1 End While End While If NormalizePopulation(newpopulation, reverse_flag) Then Return newpopulation End If Return Nothing End Function Public Sub Run(ByVal initial_population As IEnumerable(Of T), Optional ByVal reverse_flag As Boolean = False) If NormalizePopulation(initial_population, reverse_flag) Then _iteration = 0 Dim population As List(Of T) = Nothing Do population = NextPopulation(reverse_flag) _iteration += 1 Loop Until CanStop(_iteration, population) End If End Sub #End Region End Class End Namespace```

As you can see, there are few "degrees of freedom" here:
- generic IChromosome-like parameter T
- virtual functions
Virtual functions are for:

 Public Function CreateChromosome() As T The most important function that may create chromosomes of type T Public Function EvalFitness(ByVal obj As T) As Double Also important function that evaluates fitness of chromosome Protected Function Function PerformMutate() As Boolean Decide if we are going to perform mutation Protected Function PerformCrossover() As Boolean Decide if we are going to perform crossover Protected Function CanStop(ByVal iteration As Integer, ByVal population As List(Of T)) As Boolean Decide whether it is time to stop

Please realize that it is the very first iteration to something like GA framework. Now it may be used only for learning purposes. And of course VB.NET really isn't right choice for intensive computations. Download source code of base classes here.

Application of GA to well known numerical math problem (Linear Least Squares approximation of set of points)

Lets generate set of points with polynomial y=x^2 - 2*x + 1 on interval [0,1]. We will try to use genetic algorithm to find least squares approximation of this point set with 2nd degree polynomial: a(0)*x^2 + a(1)*x + a(2). Ideally we should get a(2)=1.0, a(1)=-2.0, a(2)=1.0. First, let us assume that all approximating polynomial coefficients belong to interval [-4.0, 4.0]. Now, follow classical bit-string approach - represent approximating polynomial as one long bit string, containing 3 shorter bit strings for every coefficient. Each shorter bit string, representing polynomial coefficient is mapped to [-4.0, 4.0] range using expression:

value = -4.0 + (8.0 * b ) / (2^m-1)

where b is 'integer' value of bit-string and m is number of bits. b changes from 0 to 2^m-1, so corresponding coefficient will cover [-4.0, 4.0] range. For 8-bit representation we have for each coefficient:

value = - 4.0 + (8.0 * b) / 255

where b = 0,1,2, ...., 255.
I did very simple application to demonstrate how "bit-string" GA solves this problem. User may set :
- population size;
- max.number of iterations;
- probabilities of crossover and mutation;
- range for approximation polynomial coefficients:
- size of bit string.