Chapter 5 – The Shape of Evolution
All freshmen at Guilford College were required to take an inter-disciplinary class that was called Being Human in the 20th Century. (It had recently undergone a name change from Man in the 20th Century and eventually became The First Year Experience.) JR Boyd, the head of the Mathematics Department referred to them as “a bunch of damn circle classes.” That they were; we read a book, sat around in a circle discussing them and then wrote a paper on some aspect of the book. We read pieces like They Thought They Were Free: The Germans 1933 - 1945 by Milton Mayer and Flannery O’Connor’s Everything that Rises Must Converge. I wrote a paper on the biological origin of the soul. As a college freshman, I couldn’t see that there was some miraculous event that caused the soul to spring into existence. I don’t have a copy of the paper anymore but I remember it was not well received by anyone who read or heard about it, especially the teaching assistant who graded it. Now I better understand the hesitation. My paper lead one to believe that by some unknown evolutionary process that human awareness and intelligence (what we call the soul) came into being. Millions of words have been written about the evolutionary process and still people, not creationist, but educated folks, believe there was something missing. That something was emergence. Now we’re in a position to talk about how emergence and evolution work together to build the hierarchy of the universe.
Just to review, an emergent property is a new way of interacting that comes into existence in some as yet unknown process. There are some other terms that can be substituted for this, force and selection criteria are two most used. Every new emergent property leads to a new evolutionary process that “pushes” it into the environment, exploring every nook and cranny. You cannot have a little bit of life. Once life emerged, biological evolution took it to every corner of the earth. Once gravity emerged, gravitational evolution took it to every corner of the universe. The selection criteria set up how the universe selects winners and losers. In a very real sense, evolution is a game. (I suspect that is one of the reasons humans invent games. They are ways to simplify the universe into a much easier to understand pieces.) The universe has a finite amount of energy and matter in it. That implies there is always going to be a competition when the new selection criterion, through the evolutionary process, tries different combinations to see what works. There is no predetermined path; the changes (like the mutations in biological evolution) are random. The selection criteria works as the decision maker and some changes survive and some changes become extinct. In addition, once a change is selected for survival, the evolutionary process goes to work to change (and improve) it over time. One way to explore this continuous change and improvement is to look at what happens as a result of the process – taking a snapshot of the process along the way and look for patterns. A common technique is to look at how the evolutionary process distributes the emergent property in the new landscape.
Distributions
People like to put things into groups. At an early age, we learned to sort toys by color or shape. Putting things into piles comes naturally to us and I think that it’s because that’s how the universe works. The evolutionary process makes small changes that are amplified over time. The new emergent property takes hold, spreading it across the environment. Since some of the changes are small, grouping things into larger buckets can help us better see the effects. The most popular distribution – because of its predictability – is the normal distribution. Let’s look at an example of a characteristic that exhibits a normal distribution.
Height Distribution
If you were to measure the height of every male aged 45- 54 in the United States and plot it in a frequency chart it would look like this.
This data came from the 2000 US Census. Along the bottom are the heights and the vertical line is how many people you measured with that height. The average height is between 5’8” and 5’9”. Very few middle aged men measure more than 6’4” and few are shorter than 5’. This familiar “bell-shaped” curve, another name for the normal distribution, is used in many fields – medicine, economics and college entrance test analysis. Normal distributions work best in places where the extremes between the highs and lows are small. In the case of height, the maximum difference between the tallest male and shortest male is less than 4 feet. There are physical limits to how tall a human can grow - gravity puts an upper limit on how tall you can make a human - there are no 12 foot humans. Whenever you have characteristics that have a fairly small range of values – height, weight, IQ – you’ll find the normal distribution. Statisticians work with normal distributions because they explain a lot of phenomenon and help in predicting outcomes. Most medical studies these days are based on normal distributions. When a study says something along the lines of - drinking a glass of red wine every day lowers your chances of cancer by 10% - it is depending on the normal distribution to make these pronouncements. (In a later chapter, we’ll talk about the limitations of these sorts of pronouncements. What exactly does a 10% reduction in risk of getting a certain kind of cancer mean? I remember a comic who did a bit about the Masters and Johnson sex study that said stated 60% of all married women engaged in extra-marital affairs. “Forget the statistics,” he said, “I want names and addresses!” The use of statistics is a necessity in modern clinical studies, but most folks fail to appreciate the statistics – they want the names and addresses of the people who avoid cancer by drinking red wine.)
Population Density
Now let’s look at something that doesn’t have the limited range of values associated with height or weight. The US Census data includes information about population density by state. Here we take the population of a state and divide it by the square mileage of the state to get the population density. While there is an upper limit on how high the population density could go, it turns out that for all practical purposes, it is infinite – since upper limit of population density is many thousands of times larger than the lower limit. (In comparison, the tallest human is not even three times taller than the shortest person.) Having a larger range of values to work with makes an incredible difference in how the distribution looks. Let’s start by looking at the raw data.
Along the bottom of the graph is an entry for each state (and the District of Columbia) and the vertical axis is the density. This is nothing like a normal distribution. It is more like a hockey stick lying on the ground. The scale in the vertical direction goes from 1 person per square mile (Alaska) to over 8,000 people per square mile (the District of Columbia). When the difference between the lowest values and highest values are so great, we can use some mathematics to make the picture easier to visualize. Logarithms are used to scale the vertical graph and make it easier to see how things relate to one another. Let’s look at a table of logarithms to see how they work.
Number | Logarithm |
1 | 0 |
10 | 1 |
100 | 2 |
1000 | 3 |
10000 | 4 |
10000 | 5 |
1000000 | 6 |
Logarithms “count” the amount of tens places in a number. While the numbers in the chart go from 1 to a million, the logarithm goes from 0 to 6. We turn large differences into small differences. Let’s look at the graph above using the logarithm of the population density instead of the actual population density.
Now the vertical scale goes from 0 to almost 4. That’s because the logarithm of 1 (which is the population density of Alaska) is 0 and the logarithm of 8,300 (the population density of the District of Columbia) is almost 4. Don’t be mislead, however, the raw data is the same and the population density still spans a scale of 1 to almost 10,000. (We call it 4 orders of magnitude, where an order of magnitude is 10 times as much. Between 1 and 10,000 are 4 orders of magnitude.) The logarithm makes it easier to see any patterns in the data. Now, just like the height data, we want to turn this data into a distribution by grouping states into buckets of population density. (The height data was already put into height buckets by the census bureau.) We’ll group the states by population density of single digits (1 – 10, 10 – 100, etc.) to form our buckets. We’ll make one more change in the graph to show the pattern more visibly. Instead of showing the number of states in each bucket – as we did for height- we’ll plot the logarithm of the number. Again, we use logarithms to turn large changes into smaller ones. The distribution of population density, graphed by the logarithm of the number of states in each bucket is below.
Here we see a downward sloping line from upper left to lower right. Because we are using logarithms in both the vertical and horizontal direction, for each step down and to the left we get 10 times fewer states with 10 times more population density. This curve happens so much it has been given its own name –a power curve. There are a relatively large number of states with low population densities and only one state with a really large population density. As you move downward and to the left on a power curve you get 10 times fewer things (in this case states) but each one is ten times more (in this case dense). Power curves manifest themselves in all sorts of places and I’d like to show a few examples.
Web Links – There are tens of millions of web sites with very few links. Small business, personal web pages all fall into this realm. There are thousands of websites (large businesses and government sites) with millions of links. There are hundreds of web sites with hundreds of millions of links – Google, Microsoft, etc.
Net Worth – There are billions of poor people in the world – defined as income under $100 a year. There are fewer than 100 people in the world with net worth over 10 billion dollars.
Actor’s Annual Income – Very few actors earn $20 million per picture, but there are a dozen or so that garner big paydays. They don’t have to audition for any part and “live the dream.” There are tens of thousands of actors that make less than the minimum wage, who tramp from audition to audition hoping for a break. Again, there are a lot of people making very little money and a few making tens of thousands of times more money.
Professional Athletes – See above.
In a normal distribution, the average clumps around the middle hump of the distribution (5’ 8” in the height distribution). In the power curve, the largest group is at the far left of the curve and as you move towards he right, the number drops off dramatically and the impact goes up dramatically. In a normal distribution, the largest values of the distribution are still relatively close to the average value – a very short person, say 3 feet tall, which is within 5 feet or so of the average height. In a power curve, the impact grows by orders of magnitude and the low values of net worth are nowhere near the average. That makes for an incredible draw for the struggling actors – or athletes or entrepreneurs or gamblers. They have to believe that if they can just hang in there and keep plugging along they’ll make it into the upper echelons of their profession. That very few make it is of little consequence because just knowing that someone can make it gives hope. The fact that every story of success is different (you should be thinking about chaos and the sensitivity of initial conditions here) means there is no one way to the top.
So far, we’ve focused on static distributions – looking at height distributions for a given year or population density for a given year. Distributions show you how the attribute looks at a given point in time. The distributions shown below came from the 2000 US census and the graphs would be different if we used data from previous census. In the early days of the United States, there were many fewer people and the population density was not nearly so spread out. I haven’t graphed things too far back, but I could believe that early on in the history of our country, population density followed more of a normal distribution. The evolutionary process “stretches” out the attribute and gives a competitive advantage to the outliers on the top end – the rich get richer and the poor get poorer, as the saying goes. Jared Diamond’s ”Germs, Guns and Steel” gives a number of historical accounts where a selection criteria emerged and led to an new evolutionary process that benefited some groups more than others.
We should try to better understand how the power curve works over time.
Evolution and the Power Curve
Any evolutionary process leads towards a power curve because as various mutations compete, some small set gain a competitive advantage over the rest and start accumulating more than their fair share of whatever resource is the source of competition and selection. In a December 17, 2010 NY Times Magazine article – A Physicist Solves the City – Geoffrey West’s work on population density is discussed. He has collected data that he says shows higher density leads to greater interaction which leads to more productivity. Although West’s work was done on cities, the same ideas apply to states. (We’ve already discussed how interactions are the basis of change so having more density leads to more change which leads to even more change, setting up a positive feedback loop, should seem familiar.) As a state increases its population density, perhaps because the weather was better, it experiences an increase in productivity which leads to more jobs and better pay rates which gives it a more competitive advantage over other states. Of course, having a small-sized state to begin with helps in this situation but that isn’t the driving force. Once a mutation gets selected, the feedback loop starts and when the tipping point is reached, the competition quickly falls behind. Malcolm Gladwell’s book, “Outliers” discusses this phenomenon and the story of junior hockey players is illustrative. The competitive advantage was how old a young player was relative to his peers. Since they are grouped by age, based on an entire year, players could be a much as a year older than their teammates. At the age of 10, that’s a big difference and is significant enough to give them a big leg up on becoming an elite junior hockey player. A December 25, 2010 excerpt from Eduardo Porter’s book, “The Price of Everything” discusses the capitalistic evolutionary process and its effect on the top pay for everyone from professional athletes to musical artists to bankers.
All of these writers are describing the same effect – an evolutionary process tied to an emergent selection criteria lead to the power curve. When there is a limiting factor (like gravity in the case of height or weight) you get a normal distribution. The large outliers we see at the ends of the power curve – the large number of very small at the upper left and the small number of very large at the lower right - get cut off and folded back into the middle of the power curve. This is described as the regression towards the mean and is an integral part of attributes that follow the normal distribution. Absent any external limitations, these outliers will not regress towards the mean and will spread out under evolutionary pressure and lead to a power curve. In fact, it is possible to see that some distributions start from normal distributions and as the evolutionary process plays out, the distribution spreads and more of the resource controlled by the process gets concentrated in the hands of the elite. Porter shows examples in the banking industry where before the Great Depression, limited regulation lead to a wider spread in income and post depression regulation lead to a more limited income spread throughout the banking industry. We’ll look at what happened in the 2000s when regulations were relaxed and the normal distribution was allowed to spread, via economic evolution, to a power curve.
To be complete, there are other types of interactions that can lead to power curves so just because you see a power curve does not mean that an evolutionary process is directly behind it. Let’s look at a one example where power curves are not tied directly to an evolutionary process.
Earthquakes – The familiar Richter scale for earthquakes uses a logarithm of the severity, so we’re already half way to a power curve. If you look at earthquake data over any period of time you’ll see a lot of little earthquakes and very few powerful earthquakes. It follows a power curve not because of evolution, but because it is the outcome of two closely matched, but powerful, forces working against each other. Earthquakes are formed as tectonic plates move past each other. There are many different ways they can slide past but let’s focus on a strike slip fault where the two plates slide along side of each other horizontally– like the infamous San Andreas Fault in California. The two plates – called the Pacific on the west and North American on the east - slide past each other at a slow rate of speed. These plates are massive so the forces between them are incredibly large. Most of the time, the place where the plates meet is like sandpaper so the plates slip past each other with a few little hitches. These small hitches represent small earthquakes. Every so often, a large outcropping in one or both of the plates catch on each other, like in the figure below, and the plates get hung up. The entire plate continues to move and the pressure between the plates builds up until it is finally large enough to overcome the resistance to movement formed by the outcroppings. The two plates explode past each other as the part of the fault that was “hung up” on the outcropping catches up to the rest of the plate. The distance covered as the plates realign themselves is directly related to the severity of the earthquake. During the 1906 San Francisco earthquake, these two plates moved approximately 21 feet and the quake’s intensity was estimated at 8.25 on the Richter scale. This is one of the largest earthquakes measured along the San Andreas Fault. A good question to ask is - why are there so few large earthquakes? It is directly related to the distribution on large outcroppings in the Pacific and North American plates. Now we see where an evolutionary process was involved. As the earth cooled from its initial state, the magma condensed in to rocks. If we plotted a graph of rock distribution by size, we’d find the familiar power curve as a result of the evolutionary process. There would be a lot of little rocks and a few large rocks embedded in the tectonic plates. Why? Large rocks get chewed up by the grinding action of the plates, turning them into smaller rocks. Initially there were probably more large rocks as the magma cooled, but over the millions of years of grinding the large rocks get turned into small rocks. Without a way to generate more large rocks, over time we see few large rocks and numerous smaller rocks. So even though earthquake distributions are not directly related to an evolutionary process, there is an evolutionary process as a secondary effect.
Prior to 2008, we were in a pattern governed by the strange attractor that was dominated by fact that average home prices increased every year. In its February 2009 issue, Wired magazine wrote an article “The Formula That Killed Wall Street” which discusses the formula used by banks to model risk more “accurately;” it is based on normal distributions. In the years leading up to 2008, the assumption of increasing home prices across most of the United States was true. Sure there were times and places (1980s in Texas and 1990s in San Diego) where home prices dropped but that didn’t affect the rest of the US. In a very real sense in decade preceding 2008 became a Ponzi Pyramid scheme based on real estate. Mortgage originators got paid fees to sign people up for mortgages, banks got fees for closing costs, ratings agencies got fees for ranking pools of mortgages and investment banks got fees for selling pools of mortgages (all of them with AAA ratings – imagine that!) to various investment groups looking for more interest for low risk. Ponzi schemes require dupes on the front end to bring in money and dupes on the back end to tell their friends what a great investment they found. You’ll notice the feedback loop similar to what we described in Chapter two, interactions that feed on themselves. As the pool of credit worthy people dried up, the mortgage originators started signing up for more and more riskier people to lend money to. The government was telling everyone that home ownership was a national right and it was imperative that we do everything we could to get more home owners. The loan originators were more than willing to sign up more and since the ratings agencies published their ranking models (which assumed housing prices would continue to rise) investment banks could continue to put together AAA rated pools of what turned out to be toxic mortgages and sell them as investment grade instruments. Some people noticed that this was a self-fulfilling prophecy – assuming housing prices would tend to rise across the US lead to poor credit risk people getting mortgages they could never pay back. These toxic mortgages were “magically” turned into investment grade instruments (of mass destruction) and sold to the masses. The folks who started doubting the models started placing bets that the mortgage market would fall. They believed that the market would eventually move from the current strange attractor (housing prices rising) to a new strange attractor (housing prices fall). The fact that some of those groups (like Goldman Sachs) were set to make money on both sides of the house – fees for selling the toxic investments and insurance collection when they failed – is painful. In 2010, Goldman Sachs settled with the SEC (without admitting fault, it’s good to not admit fault as it keeps one’s ass out of jail in a criminal prosecution) for $500 million in just one of the many mortgage deals they sold. Even more fascinating, Goldman thought there was a good chance that the people who were given these mortgages wouldn’t be able to pay them so they took out insurance with AIG to protect them in the (likely) event the economy went to hell in a hand basket. When AIG didn’t have enough money to pay the claims, the US government stepped in and paid Goldman Sachs (and others) 100 cents on the dollar (via a $80 billion bailout of AIG). It’s good to be king. [Aside: Goldman and some other banks deemed too big to fail were able to keep the money they made on the run up and kept in business to continue to make more money in the aftermath. In a strict evolutionary system, these leeches would have been allowed to fail and all of them would have gone bankrupt. A purely capitalistic society is too much for us to stomach so we have introduced various flavors of socialism to make the system more humane.] [Aside #2: The housing bubble was not an exclusively US event but I limited this discussion to the US situation.]
In the summer of 2008, Ben Bernake made a statement that the foreclosure problems were under control. That is because the models said things were under control. Every model used by the banks and mortgage risk analysts assumed a normal (also known as Gaussian) distribution. However, the economy is controlled by capitalism, which is the selection criterion for the US economy. By definition then, the US economy is chaotic and doesn’t follow a normal distribution. It was just a matter of time before the models got it wrong and boy did they get it wrong. The financial meltdown of 2008 started in just 35 counties, spread mostly between the states of California, Florida, Nevada and my home county of Fulton in the state of Georgia. As we found out, in a chaotic system (all evolutionary systems are chaotic) a small change can lead to large changes. When gasoline went to $4.50 a gallon, people in those 35 counties couldn’t drive to work and make their mortgage payments – especially as their introductory interest rates doubled or tripled their payments. That was the tipping point. These 35 counties started showing an increase in foreclosures and in a short amount of time, the entire economy was frozen. The transition from housing prices always rising to falling took a breath taking short amount of time – roughly 3 months after Ben Bernake made his infamous statement, Lehman Brothers declared bankruptcy. A strange attractor (rising housing prices) that had gone on for many decades turned around in a few months. That’s the way the transitions go. They happen very quickly and no one (even an intelligent designer) can predict when they will turn.
That’s why people can’t model the transition between strange attractors. It is not only too hard but I suspect there is no pattern to model. That transition is almost pure chaotic behavior and any small change can lead, very quickly, to large variances. If we were able to run the clock back and look at the financial meltdown of 2008 the number of interactions are so large that a change of one of them would have lead to another pathway and the timing would have been different. But the end result, a recession, would have been the same. We’ll explore the relationship between science, strange attractors, chaos and predictions in a later chapter.