Evolution is not the best force for change over time

We often overlook the importance of the word learning in the catch-phrase "machine learning". But, what does it mean? Learning refers to a process of iterative improvement. While there are quite a few algorithms, there are really just a couple of common learning techniques. One of the most common (and simplest) is gradient descent. This technique is relatively simple.
Computational approach

I will walk through the algorithm step by step. Start by instantiating a class:

class Learn():
    def __init__(self):
        self.weight=1
        self.intercept=0
        self.functor= lambda x: (self.weight*x)+self.intercept
        self.r=0.01

The important part is where I tell the algorithm how to change itself. For something such as linear regression, I can do it in just a few lines of code:

    def score(self,x,y):
        n=len(x)
        grad_m,grad_b=0.,0.
        m_fxn=lambda x: -1.*x
        b_fxn=lambda x: -1.
        for i in range(n):
            grad_base=self.functor(x[i])-y[i]
            grad_m+=m_fxn(x[i])*grad_base
            grad_b+=b_fxn(x[i])*grad_base
        norm=-2./n
        self.m=self.m-grad_m*norm*self.r
        self.b=self.b-grad_b*norm*self.r  

I do not normally release code snippets, but I figured it would be alight to share something so central to machine learning. As you can see, I can implement this from scratch. The magic behind gradient descent algorithm is ability of it to make changes to its parameters based upon the values of the error function relative to each parameter. In order to solve the problem, I can now just iterate until the solution converges. (If you try this technique and it diverges, then you may need to solve a second set of equations to determine the learning rate).

    def model(self,x,y,size=10**3):
        for i in range(size):
            self.score(x,y)

While gradient descent may be obvious to some well-versed readers, I think it is important to point out the advantage of this approach over other approaches to incremental improvements. For example, people often talk about evolution as the most powerful mechanism for change in biological organisms. This is simply not true.

Random variation is a powerful tool for solving problems. When nature uses it we refer to it as evolution. The premise is that successful forms of variation tend to re-occur because the animals that benefit from them advantage interactions with their environment. However, the limitation in the power of evolution becomes evident when comparing it to other forms of improvement over time. For example, we now know that evolution is not the primary drive of change in more advanced organisms. To examine what I mean, consider an evolutionary solution the the previous problem:

class Evolve():
    def __init__(self):
        self.weight=1
        self.intercept=0
        self.wmin,self.wmax=0.,10**3
        self.imin,self.imax=0.,10**3
        self.r=lambda mini,maxi: random.randrange(mini,maxi)
        self.functor= lambda x: (self.weight*x)+self.intercept
        self.error=float('inf')
        self.m=self.weight
        self.b=self.intercept

    def score(self,x,y):
        self.m=self.weight
        self.b=self.intercept
        self.weight=self.r(self.wmin,self.wmax)
        self.intercept=self.r(self.imin,self.imax)
        n=len(x)
        error=sum([(y[i]-self.functor(x[i]))**2 for i in range(n)])**.5
        if error<self.error:
            self.error=error
        else:
            self.intercept=self.m
            self.weight=self.b

    def model(self,x,y,size=10**3):
        for i in range(size):
            self.score(x,y)

If you were to run the above program, you would notice that it improves over time. However, it improves much more slowly. It also does not always converge on the same solution. The above contrasts learning and evolution. Nature still has another trick for solving tough problems.

Animals that have a concept of self and of other are able to make selections based upon the perceived social or environmental abilities of the other animal. This is a sexual selection process. This process is much more powerful than evolution. While it still takes quite a while for an optimal change to become embedded in the core of our genome, humans can markedly change the representation of certain traits within as few as a couple of generations. No longer prone to the whims of chaos, the process of improvement over time now has direction. This selection process causes the random variations in evolution to focus upon particular traits. However, it struggles to balance the rate at which a each parameter converges. For this, nature requires a form of central direction.

Learning is far more potent than either evolution or selection. While selection converges, it lacks direction. Learning offers us a technique to instantaneously determine direction upon each iteration. To summarize, let me illustrate the success of each of the three approaches to solving a simple linear regression equation to determine the weights m and b:

Evolving:
Predicted m: 57
Actual m: 43.9652012955
Predicted b: 4
Actual b: 12.1165068006
Selection:
Predicted m: 40.3281413905
Actual m: 43.965201295
Predicted b: 45.6543857325
Actual b: 12.1165068006
Learning:
Predicted m: 43.9651654396
Actual m: 43.9652012955
Predicted b: 12.1164853632
Actual b: 12.1165068006

While we can typically solve linear regression using ordinary least squares and matrix operations, I have shown that we can also solve this incrementally using a gradient descent. I have also shown that we can also solve it through random permutation and through collective permutation. Clearly a well informed approach makes a huge difference in the amount of raw computational power we require to solve a problem. This is further evidence that perhaps we should spend at least as much time considering the science behind our approach as we do figuring out how apply as much hardware as possible to an expensive solution.

Tags: