Unexpected events (by flaw of language, I must resume Lojban) and events you have no reason to expect

When Nassim Taleb describes a black swan as an event that is "unexpected," it doesn't necessarily mean that nobody saw it coming. It rather means nobody had any reason to think it would occur. 

The class of events that we have no reason to expect has infinite cardinality [1] and usually we spend very little time thinking about its members anyway. I propose that this last bit is important because it's a sufficient generator of surprise: if you started to enumerate things that you had little or no reason to expect happening (and also perhaps the likelihood of a little bit of research increasing your expectation of it happening), you'd probably despair and give up the experiment once once something unexpected happened that you hadn't gotten to. 

So far this has been the kind of rationality-based thinking that I distrust. But here's an example of a situation where this tendency to ignore things that we have no reason to expect can be bad. Dr Stears is discussing phylogenetics, i.e., reconstruction of the historic tree of life:

organisms don't come with a barcode on their foreheads telling us who they are related to. We have to try to figure out who they're related to, and when we understand the relationships, then we know the history, because the relationships define the history.

So we work with hypotheses about history, and we test these hypotheses against each other and try to come up with the one that's most consistent with the data that we've got.

If you restrict yourself to the hypotheses for which you have reasons they might work, you are leaving out this large set of hypotheses that you have no reason to believe to be viable. Possibly in some domains, the set of hypothesis "basis vectors" needed is few and you know them, and possibly phylogenetics is one of them. But I wonder if you're leaving yourself open to large surprises if your methods can't easily consider all the unreasonable hypotheses as well---and open to lawsuits if your misinformed prescriptions led people to do harmful instead of beneficial things.

[1] The cardinality of a class being the number of elements in it.

Non-ergodicity from this simple definition of microevolution and macroevolution?

Perhaps my more learned friends in stochastic processes can assist me in surveying these two qualitative descriptions of microevolution and macroevolution, and answer this question: under what conditions could one conclude the overall process of evolution to be non-ergodic?

Biological evolution has two big ideas. One of them has to do with how the process occurs, and that's called microevolution. It's evolution going on right now. Evolution is going on in your body right now. You've got about 10^13th bacteria in each gram of your feces, and they have enough mutations in them to cover the entire bacterial genome. Every time you flush the toilet, you flush an entire new set of information on bacterial genomes down the toilets. It's going on all the time.

Now, the other major theme is macroevolution. This process of microevolution has created a history, and the history also constrains the process. The process has been going on for 3.8 billion years. It has created a history that had unique events in it, and things happened in that history that now constrain further microevolution going on today.

(Stearns, Open Yale)


If the microevolutionary search is random, constrained by natural selection yet emancipated by neutral selection (allowing nature to massively parallelize the search for serendipitous discoveries), it appears to me that the total process would be non-ergodic.

This could be a very important contribution to biology, as I'm not sure if the contentions raised by Stephen Jay Gould have ever been fully addressed by mathematical biologists, contentions like contingency. Non-ergodicity is what triggers the same experiment repeated multiple times yields totally different results, as Cohen humorously illustrates in his brief 1976 paper, "Irreproducible Results and the Breeding of Pigs (Or Nondegenerate Limit Random Variables in Biology)". You "rerun the tape of life" and different chemical factories are discovered first, different niches get occupied first, and after 3.8 billion years, the constraints of macroevolution are totally different than they are today.

A brief aside on the "tape of life": this means our kind of life---it is a fact of incredible amazement to me that all the living beings on earth seem to be descended from the same common ancestor (except viruses, for whom the empirical jury is still out, according to Stearns); it would be most comforting if another lineage had survived, or evidence for it could be found.

There are possibly more intellectually appealing questions in this line of reasoning but this one about non-ergodicity I think may be the most fruitful.

Trying to replace HN/NYT-type noise with Nature. On "Climate science: No solar fix : Article : Nature"

Georg Feulner and Stefan Rahmstorf of the Potsdam Institute for Climate Impact Research used a global climate model to examine the effect of a Maunder-type minimum on global mean temperature by 2100. The model reproduced the cooling of past solar minima, but when simulating the future the authors found that the solar effect was overwhelmed by the much larger temperature increase due to greenhouse-gas emissions.

This article is a 2-paragraph snippet of the cited work that tries to assess the impact of a minimum in solar activity on global cooling, as happened in the 1600s (the Maunder minimum).

This snippet just tells me that a global climate model backtests the Maunder minimum of the 1600s and makes some prediction about the future---how is that at all a useful piece of information? We know that there are countless models, many of which make perfect sense, that backtest the past well but have nothing useful to say about the future!

(I don't know about the actual paper, beyond my a priori tepidity on climate prediction in light of how little those researchers have done to allay my fears of fat-tailed errors).

Thanking Science News for "Odds Are, It's Wrong" (2010)

Even when “significance” is properly defined and P values are carefully calculated, statistical inference is plagued by many other problems. Chief among them is the “multiplicity” issue — the testing of many hypotheses simultaneously. When several drugs are tested at once, or a single drug is tested on several groups, chances of getting a statistically significant but false result rise rapidly. Experiments on altered gene activity in diseases may test 20,000 genes at once, for instance. Using a P value of .05, such studies could find 1,000 genes that appear to differ even if none are actually involved in the disease. Setting a higher threshold of statistical significance will eliminate some of those flukes, but only at the cost of eliminating truly changed genes from the list. In metabolic diseases such as diabetes, for example, many genes truly differ in activity, but the changes are so small that statistical tests will dismiss most as mere fluctuations. Of hundreds of genes that misbehave, standard stats might identify only one or two. Altering the threshold to nab 80 percent of the true culprits might produce a list of 13,000 genes — of which over 12,000 are actually innocent.

For me, as someone not trained by statisticians but who works with noisy engineering systems (radar, communication links and networks, etc.), this article presented me with a really valuable compendium of the most common ways scientific studies, especially in social and behavioral sciences, but also studies in medicine and genetics, fail and can be safely discarded.

Non-practitioners (non-noise-engineers and non-statisticians) may not realize it but statistics is actually a deeply philosophical human endeavor, and this makes a grave side-point: mathematics is really good at describing things, but whether those things exist in real life can only be answered by a human who is interested.

Anyway, the article discusses the arbitrariness and uselessness of P values, especially in today's era of large-scale data-gathering systems; mistaking statistical significance for practical significance; not weighing the probability of missed detections versus false alarms (ROC curves have huge importance in electrical engineering [1]); the fact that when you run thousands of trials, at least some will not be randomized by chance; the importance of replication and how rarely it's done in the social sciences.

All these points were very valuable to my understanding of this important topic. John Ioannidis published a 2005 paper, "Why most published research findings are false" [2] indicting many sub-fields of public health and medical research, that I am not mathematically mature enough to understand, and I'm hoping that the Science News article covers a lot of Ioannidis' topics in "simple" English.

All of this is very important to people who want to de-fragilize our society and reduce the impact of catastrophic errors of understanding and decrease the power that nerds hold over our lives---nerds being unimaginative, unreflective people who take facts handed to them by some higher authority as gospel and cannot imagine a world where they are untrue or inapplicable.

[1] ROC curve: http://en.wikipedia.org/wiki/Receiver_operating_characteristic
[2] Ioannidis (2005): http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1182327/

[more on rejecting noise & fashionable-thoughts catalogs]

It saddens me to realize that for every thoughtful person, atheist or religious, there are ten of these unintrospective, ignorant, and usually journalistic types, who cannot imagine a world outside the fashion (idea) catalogs they take so much smug satisfaction in worshipping. From a discussion of the unmatched Samuel Butler:

"The discovery of Mendel’s Laws, and then DNA, finally put paid to the Lamarckian theory of evolution. It seemed the idea of a creature that could ‘will’ its own evolutionary direction was quite untenable. The genetic blue-print we pass on is the one we are born with and it operates quite independently of any use we make of it or any plans we may have for it." (http://www.threemonkeysonline.com/als/_samuel_butler_sociobiology.html)

This writer, so unfamiliar with the skeptical and ephemeral nature of knowledge (and even science), would no doubt just add epigenetics, and the willful expression of one of many myriad genetic capabilities, to their stupid worthless idea-catalog. 

Perhaps this is a good introduction to a very empirical effort to understand evolution and epigenetics and the lot: http://www.marksdailyapple.com/epigenetics/ --- but I would encourage the meta-goal to quit thinking out of a catalog all together.

More on the topic...

"The Iliad and Odyssey have been used as text-books for education during at least two thousand five hundred years, and yet it is only during the last forty or fifty that people have begun to see that they are by different authors. Can there be any more scathing satire upon the value of literary criticism?" (Samuel Butler, 1892, The Humour of Homer
Butler, who is awesome and is a writer, is making fun of critics, i.e., those who can't do. 

Literary types, the kind that can't do (and there's so many, from nutters like Marx, to modern day journalistic nuts musing on quantum physics or molecular evolution or fat-tailed risks), ought to steer clear of ordinary people. Ought to, but often don't because ordinary people love paying for their claptrap. The interesting question has been, what do ordinary people get out of over-simplified and incorrect analyses?