About thirty years ago there was much talk that geologists ought only to observe and not theorize; and I well remember someone saying that at this rate a man might as well go into a gravel-pit and count the pebbles and describe the colours. How odd it is that anyone should not see that all observation must be for or against some view to be of any service.
The hypothesis is the pivot point around which science turns.
This is one reason why “No one has ever collected this data before” is, by itself, a weak rationale for a dissertation. It is impossible to fund your research by promising only to collect cool data. It is the hypotheses that make the data cool, or, at the very least, show why its cool.
So what is an hypothesis? It is a series of assumptions, tied together by logic, that generates novel predictions. Collectively, it is an explanation that answers a scientific question. Let’s break that down.
A series of assumptions.
Assumptions are premises that “such and such” are true. All hypotheses have them, by definition. Natural selection, famously, is built upon the assumptions that
A1: Individuals vary in traits.
A2: These traits differ in their ability to confer survival and reproduction in a given environment.
A3: These traits are passed down from parent to offspring.
Tied together by logic
The logic bit is, in part, why successful Ph. D.s are called Doctors of “Philosophy”. Connecting the assumptions and determining their consequences is best done with mathematics. But good logic ultimately comes down to “If A, B, and C, are true then D and E must follow”.
To generate novel predictions
Predictions arise from the frisson of your assumption set. (Yes, I just wrote that. It’s Friday.) Predictions are not a restatement of the assumptions, but the only possible logical consequence of them. Empiricists love long lists of predictions. Two key predictions of Natural selection:
P1: As an environment changes, the set of traits in the descendants of a population will differ predictably from that of their parental population.
P2: In an unchanging environment, the set of traits of populations will tend to reach an equilibrium.
Good hypotheses are gorgeous things. The best hypotheses make plenty of predictions, and those predictions have numbers (predicting, not just that “Y will increase with X”, but “Y will increase as X^0.5”). Such hypotheses sweep through a scientific field like a cyclone, sucking up the funding resources and rearranging the intellectual landscape. But in a *good* way.
That said, hypotheses are often misunderstood, misstated, and frequently obscured in scientific work. To see how, keep reading.
Here are a few reasons why a good hypothesis is hard to find.
1) Scientists and statisticians often use a different definition of hypothesis, and scientists often use statistics.
To a statistician, an hypothesis is a predicted deviation from some expectation of randomness in a dataset (e.g., thirty coin flips, all yielding heads, reject the null hypothesis, and support the hypothesis of a biased coin). This conflation of pattern detection (what statistics does) and search for cause (what scientists do) leads to the commonly heard statement in a scientific seminar:
“I hypothesized that when I increased X, Y would increase too.”
That statement is, at best, a prediction. It constitutes a statistical hypothesis that is an alternative to another statistical hypothesis, that of “no correlation”. But as the statement says nothing about cause, it is not a scientific hypothesis, and contains far less information than a scientific hypothesis.
2) Hypotheses are often buried in a scientific paper, or worse, implied, or…still worse, entirely absent.
It would be a sweet, sweet world if, a paper’s Introduction included the statement of a question, then laid out the hypotheses being tested, including key assumptions, and concluded with “Here we test the following assumptions and predictions”. But often the best you can hope for in a PDF you just downloaded is a prediction or two, sprinkled randomly throughout the paper (sometimes in the last paragraph of the Discussion!). Hopefully there will be citations that lead you to a fuller exegesis of the hypothesis in question. But not always. This sad state of affairs is particularly challenging for folks new to a field. At the same time, this is why new blood is so critical: it is those folks naive as to the dogma of the discipline that see through the poor logic, faulty assumptions, or unrecognized predictions. Their papers often become famous.
3) Hypotheses are tested by evaluating their assumptions, logic, and/or predictions.
A paper that convincingly refutes a key (often unrecognized) assumption of an hypothesis kills it just as dead as a test of one of its predictions. That’s progress folks.
4) A good hypothesis attracts a lot of attention.
Empiricists love good hypotheses because they love to collect data. And a good hypothesis, with lots of novel predictions, especially those that allow the reinterpretation of existing data, or the easy collection of new data, will be cited. A lot.
5) Good hypotheses demand good data.
A good hypothesis will be tested with crappy data. Just as flawed hypotheses regularly make it into the literature; flawed tests “falsifying” the hypothesis with weak tests inevitably get published, often by someone with a reputation as an “iconoclast” who collects datasets of variable quality, treating them all as equal, and publishing a meta-analysis. Making matters worse, the poorer the data set, the easier it is to reject really good quantitative hypotheses. So the best potential hypotheses get the worse treatment from sloppy data.
[Note to reader, one tell for a really good, ax-grinding rant, is the run-on sentence.]
6) Grant proposals demand good hypotheses.
With low funding rates, and the demand for transformative science, NSF reviewers are on the lookout for provocative questions for which strong hypotheses are tested with data collected using a variety of methods. Spending the time to really think out your hypothesis, and all the ways you can test it (verifying its assumptions and validating its predictions), is time well spent.
Soon we’ll get to the notion of testing multiple alternative hypotheses. But I need to get a new ax. This one is worn down to the handle.
So waddya think?