Showing posts with label research. Show all posts
Showing posts with label research. Show all posts

Sunday, 29 May 2016

Ten serendipitous findings in psychology

The Thatcher Illusion (see below)
I'm a great fan of pre-registration of studies. It is, to my mind, the most effective safeguard against p-hacking and publication bias, the twin scourges that have led to the literature being awash with false positive findings. When combined with a more formal process, as in Registered Reports, it also allows researchers to benefit from reviewer expertise before they do the study, and to take control of the publication timeline.

But one salient objection to pre-registration comes up time and time again: if we pre-register our studies it will destroy the creative side of doing science, and turn it instead into a dull, robotic, cheerless process. We will have to anticipate what we might find, and close our eyes to what the data tell us.

Now this is both silly and untrue. For a start, there's nobody stopping anyone from doing fairly unstructured exploration, which may be the only sensible approach when entering a completely new area. The main thing in that case is to just be clear that this is what it is, and not to start applying statistical tests to the findings. If a finding has emerged from observing the data, testing it with p-values is statistically illiterate.

Nor is there any prohibition on reporting unexpected findings that emerge in the course of a study. Suppose you do a study with a pre-registered hypothesis and analysis plan, which you adhere to. Meanwhile, a most exciting, unanticipated phenomenon is observed in your experiment. If you are going down the kind of registered reports pathway used in Cortex, you report the planned experiment, and then describe the novel finding in a separate section. Hypothesis-testing and exploration are clearly delineated and no p-values are used for the latter.

In fact, with any new exciting observation, any reputable scientist would take steps to check its repeatability, to explore the conditions under which it emerges, and to attempt to develop a theory that can account for it. In effect, all that has happened is that the 'data have spoken' and suggested a new hypothesis, which could potentially be registered and evaluated in the usual way.

But would there be instances of important findings that would have been lost to history if we started using pre-registration years ago? Because I wanted examples of serendipitous findings to test this point, I asked Twitter, and lo, Twitter delivered some cracking examples. All of these predate by many years the notion of pre-registration, but note that, in all cases, having made the initial unexpected observation – either from unstructured exploratory research, or in the course of investigating something else - the researchers went on to shore up the findings with further, hypothesis-driven experiments. What they did not do is to report just the initial observation, embellished with statistics, and then move on, as if the presence of a low p-value guaranteed the truth of the result.

Here are ten phenomena well-known to psychologists that show how the combination of chance and the prepared mind can lead to important discoveries*. Where I could find one, I cite a primary source, but readers should feel free to contribute further background information.

1. Classical conditioning, Pavlov, 1902. 
The conventional account of Pavlov's discovery goes like this: He was a physiologist interested in processes of digestion and was studying the tendency of dogs to salivate when presented with food. He noted that over time, the dogs would salivate when the lab assistant entered the room, even before the food was presented, thus discovering the 'conditioned response': a response that is learned by association. A recent account is here. I was not able to find any confirmation of the serendipitous event in either Pavlov's Nobel speech, or in his Royal Society obituary, so it would be interesting to know if this described anywhere in his own writings or those of his contemporaries.

One thing that I did (serendipitously) discover from the latter source, was this intriguing detail, which makes it clear that Pavlov would never have had any truck with p-values, even if they had been in use in 1902: "He never employed mathematics even in its elementary form. He frequently said that mathematics is all very well but it confuses clear thinking almost to the same extent as statistics."

Suggested by @speech_woman @smomara1 @AglobeAgog 

2. Psychotropic drugs, 1950s 
Chance appears to have played an important role in the discovery of many psychotropic drugs in the early days of psychopharmacology. For instance, tricyclics were initially used to treat tuberculosis, when it was noticed that there was an unanticipated beneficial effect on mood. Even more striking is Hoffman's first-hand account of discovering the psychotropic effects of LSD, which he had developed as a potential circulatory stimulant. After experiencing strange sensations during a laboratory session, Hoffman returned to test the substances he had been working with, including LSD. "Even the first minimum dose of one quarter of a milligram induced a state of intoxication with very severe psychic disturbances, and this persisted for about 12 hours….This first planned experiment with LSD was a particularly terrifying experience because at the time, I had no means of knowing if I should ever return to everyday reality and be restored to a normal state of consciousness. It was only when I became aware of the gradual reinstatement of the old familiar world of reality that I was able to enjoy this greatly enhanced visionary experience".

Suggested by @ollirobinson @kealyj @neuroraf 

3. Orientation-sensitive receptive fields in visual cortex, 1959 
In his Nobel speech, David Hubel recounts how he and Torsten Wiesel were trying to plot receptive fields of visual cortex neurons using dots of light projected onto a screen, with only scant success, when they observed a cell that gave a massive response as a slide was inserted, creating a faint but sharp shadow on the retina. As he memorably put it, "over the audiomonitor, the cell went off like a machine gun". This initial observation led to a rich vein of research, but, again to quote from Hubel "It took us months to convince ourselves that we weren’t at the mercy of some optical artefact".

 Suggested by: @jpeelle @Anth_McGregor @J_Greenwood @theExtendedLuke @nikuss @sophiescott, @robustgar 

4. Right ear advantage in dichotic listening, 1961 
Doreen Kimura reported that when groups of digits were played to the two ears simultaneously, more were reported back from the right than the left ear (review here). This method was subsequently used for assessing cerebral lateralisation in neuropsychological patients, and a theory was developed that linked the right ear advantage to cerebral dominance for language. I have not been able to access a published account of the early work, but I recall being told during a visit to the Montreal Neurological Institute that it had taken time for the right ear advantage to be recognised as a real phenomenon and not a consequence of unbalanced headphones. The method of dichotic listening dated back to Broadbent or earlier, but it had originally been used to assess selective attention rather than cerebral lateralisation.

5. Phonological similarity effect in STM, 1964 
Conrad and Hull (1964) described what they termed 'acoustic confusions' when people were recalling short sequences of visually-presented letters, i.e. errors tended to involve letters that rhymed with the target letter, such as P, D, or G. In preparation for an article celebrating his 100th birthday, I recently listened to a recording of Conrad describing this early work, and explaining that when such errors were observed with auditory presentation, it was assumed they were due to mishearings. Only after further experiments did it become clear that the phenomenon arose in the course of phonological recoding in short-term memory. 

6. Hippocampal place cells, 1971 
In his 2014 Nobel lecture,  John O'Keefe describes a nice example of unconstrained exploratory research: "… we decided to record from electrodes … as the animal performed simple memory tasks and otherwise went about its daily business. I have to say that at this stage we were very catholic in our approach and expectations and were prepared to see that the cells fire to all types of situations and all types of memories. What we found instead was unexpected and very exciting. Over the course of several months of watching the animals behave while simultaneously listening to and monitoring hippocampal cell activity it became clear that there were two types of cells, the first similar to the one I had originally seen which had as its major correlate some non-specific higher-order aspect of movements, and the second a much more silent type which only sprang into activity at irregular intervals and whose correlate was much more difficult to identify. Looking back at the notes from this period it is clear that there were hints that the animal’s location was important but it was only on a particular day when we were recording from a very clear well isolated cell with a clear correlate that it dawned on me that these cells weren’t particularly interested in what the animal was doing or why it was doing it but rather they were interested in where it was in the environment at the time. The cells were coding for the animal’s location!" Needless to say, once the hypothesis of place cells had been formulated, O'Keefe and colleagues went on to test and develop it in a series of rigorous experiments.

7. McGurk effect, 1976 
In a famous paper, McGurk and McDonald reported a dramatic illusion: when watching a talking head, in which repeated utterances of the syllable [ba] are dubbed on to lip movements for [ga], normal adults report hearing [da]. Those who recommended this example to me mentioned that the mismatching of lips and voices arose through a dubbing error, and there was even the idea that a technician was disciplined for mixing up the tapes, but I've not found a source for that story. I noted with interest that the Nature paper reporting the findings does not contain a single p-value.
 
Suggested by: @criener @neuroconscience @DrMattDavis 

8. Thatcher illusion, 1980 
Peter Thompson kindly sent me an account of his discovery of the Thatcher Illusion (downloadable from here, p. 921). His goal had been to illustrate how spatial frequency information is used in vision, entailing that viewing the same image close up and at a distance will give very different percepts if low spatial frequencies are manipulated. He decided to illustrate this with pictures of Margaret Thatcher, one of which he doctored to invert the eyes and mouth, creating an impressively hideous image. He went to get sellotape to fix the material in place, but noticed that when he returned, approaching the table from the other side, the doctored images were no longer hideous when inverted. Had he had sellotape to hand, we might never have discovered this wonderful illusion.

Suggested by @J_Greenwood 

9. Repetition blindness, 1987 
Repetition blindness, described here by Nancy Kanwisher, is the phenomenon whereby people have difficulty detecting repeated words that are presented using rapid serial visual presentation (RSVP) - even when the two occurrences are nonconsecutive and differ in case. I could not find a clear account of the history of the discovery, but it seems that researchers investigating a different problem thought that some stimuli were failing to appear, and then realised these were the repeated ones.

Suggested by @PaulEDux 

10. Mirror neurons, 1992 
Giacomo Rizzolatti and colleagues were recording from cells in the macaque premotor cortex that responded when the animal reached for food, or bit a peanut. To their surprise, they noticed when testing the animals, the same cell that responded when the monkey picked up a peanut also responded when the experimenter did so (see here for summary). Ultimately, they dubbed these cells 'mirror neurons' because they responded both to the animal's own actions and when the animal observed another performing a similar action. The story that mirror neurons were first identified when they started responding during a coffee break as Rizzolatti picked up his espresso appear to be apocryphal.

Suggested by: @brain_apps @neuroraf @ArranReader @seriousstats @jameskilner @RRocheNeuro 

 *I picked ones that I deemed the clearest and best-known examples. Many thanks to all the people who suggested others.

Tuesday, 22 March 2016

Better control of the publication time-line: A further benefit of Registered Reports


I’ve blogged previously about waste in science. There are numerous studies that are completed but never see the light of day. When I wrote about this previously, I focused on issues such as reluctance of journals to publish null results, and the problem of writing up a study while applying for the next new grant. But here I want to focus on another factor: the protracted and unpredictable process of peer review that can lead to researchers to just give up on a paper.

Sample Gantt chart. Source: http://www.crp.kk.usm.my/pages/jepem.htm
The sample Gantt chart above nicely illustrates a typical scenario.  Let's suppose we have a postdoc with 30 months’ funding. Amazingly, she is not held up by patient recruitment issues, or ethics approvals, and everything goes according to plan, so 24 months in, she writes up the study and submits it to a journal. At the same time, she may be applying for further funding or positions. She may plan to start a family at the end of her fellowship. Depending on her area of study it may take anything from two weeks to six months to hear back from the journal*. The decision is likely to be revise and resubmit. If she’s lucky, she’ll be able to do the revisions and get the paper accepted to coincide with the end of her fellowship.  All too often, though, the reviewers suggest revisions. If she's very unlucky they may demand additional experiments, which she has no funding for.  If they just want changes to the text, that's usually do-able, but often they will suggest further analyses that take time, and she may only get to the point of resubmitting the manuscript when her money runs out. Then the odds are that the paper will go back to the reviewers – or even to new reviewers – who now have further ideas of how the paper can be improved. But now our researcher might have started a new job, have just given birth, or be unemployed and desperately applying for further funds.

The thing about this scenario, which will be all too familiar to seasoned researchers (see a nice example here), is that it is totally unpredictable. Your paper may be accepted quickly, or it may get endlessly delayed. The demands of the reviewers may involve another six month’s work on the paper, at a point when the researcher just doesn’t have the time. I’ve seen dedicated, hardworking, enthusiastic young researchers completely ground down by this situation, faced by the choice of either abandoning a project that has consumed a huge amount of energy and money, or somehow creating time out of thin air. It’s particularly harsh on those who are naturally careful and obsessive, who will be unhappy at the idea of doing a quick and dirty fix to just get the paper out. That paper which started out as their pride and joy, representing their best efforts over a period of years is now reduced to a millstone around the neck.

But there is an alternative. I’ve recently, with a graduate student, Hannah Hobson, put my toe in the waters of Registered Reports, with a paper submitted to Cortex looking at an electrophysiological phenomenon known as mu suppression. The key difference from the normal publication route is that the paper is reviewed before the study is conducted, on the basis of an introduction and protocol detailing the methods and analysis plan. This, of course takes time – reviewing always does. But if and when the paper is approved by reviewers, it is provisionally accepted for publication, provided the researchers do what they said they would.

One advantage of this process is that, after you have provisional acceptance of the submission, the timing is largely under your own control. Before the study is done, the introduction and methods are already written up, and so once the study is done, you just add the results and discussion. You are not prohibited from doing additional analyses that weren’t pre-registered, but they are clearly identified as such. One the study is written up the paper goes back to reviewers. They may make further suggestions for improving the paper, but what they can’t do is to require you to do a whole load of new analyses or experiments. Obviously, if a reviewer spots a fatal error in the paper, that is another matter. But reviewers can’t at this point start dictating that the authors do further analyses or experiments that may be interesting but not essential.

We found that the reviewer comments on our completed study were helpful: they advised on how to present the data and made suggestions about how to frame the discussion. One reviewer suggested additional analyses that would have been nice to include but were not critical; as Hannah was working to tight deadlines for thesis completion and starting a new job, we realised it would not be possible to do these, but because we have deposited the data for this paper (another requirement for a Registered Report), the door is left open for others to do further analysis.

I always liked the idea of Registered Reports, but this experience has made me even more enthusiastic for the approach. I can imagine how different the process would have been had we gone down the conventional publishing route. Hannah would have started her data collection much sooner, as we wouldn’t have had to wait for reviewer comments. So the paper might have been submitted many months earlier. But then we would have started along the long uncertain road to publication. No doubt reviewers would have asked why we didn’t include different control conditions, why we didn’t use current source density analysis, why we weren’t looking at a different frequency band, and whether our exclusionary criteria for participants were adequate. They may have argued that our null results arose because the study was underpowered. (In the pre-registered route, these were all issues that were raised in the reviews of our protocol, so had been incorporated in the study). We would have been at risk of an outright rejection at worst, or requirement for major revisions at best. We could then have spent many months responding to reviewer recommendations and then resubmitting, only to be asked for yet more analyses.  Instead, we had a pretty clear idea of the timeline for publication, and could be confident it would not be enormously protracted.

This is not a rant against peer reviewers. The role of the reviewer is to look at someone else’s work and see how it could be improved. My own papers have been massively helped by reviewer suggestions, and I am on record as defending the peer review system against attacks. It is more a rant against the way in which things are ordered in our current publication system. The uncertainty inherent in the peer review process generates an enormous amount of waste, as publications, and sometimes careers, are abandoned. There is another way, via Registered Reports, and I hope that more journals will start to offer this option.

*Less than two weeks suggests a problem!See here for an example.

Tuesday, 26 January 2016

The Amazing Significo: why researchers need to understand poker

©www.savagechickens.com
Suppose I tell you that I know of a magician, The Amazing Significo, with extraordinary powers. He can undertake to deal you a five-card poker hand which has three cards with the same number.

You open a fresh pack of cards, shuffle the pack and watch him carefully. The Amazing Significo deals you five cards and you find that you do indeed have three of a kind.

According to Wikipedia, the chance of this happening by chance when dealing from an unbiased deck of cards is around 2 per cent - so you are likely to be impressed. You may go public to endorse The Amazing Significo's claim to have supernatural abilities.

But then I tell you that The Amazing Significo has actually dealt five cards to 49 other people that morning, and you are the first one to get three of a kind. Your excitement immediately evaporates: in the context of all the hands he dealt, your result is unsurprising.

Let's take it a step further and suppose that The Amazing Significo was less precise: he just promised to give you a good poker hand without specifying the kind of cards you would  get. You regard your hand as evidence of his powers, but you would have been equally happy with two pairs, a flush, or a full house. The probability of getting any one of those good hands goes up to 7 per cent, so in his sample of 50 people, we'd expect three or four to be very happy with his performance.

So context is everything. If The Amazing Significo had dealt a hand to just one person and got a three-of-a-kind hand, that would indeed be amazing. If he had dealt hands to 50 people, and predicted in advance which of them would get a good hand, that would also be amazing. But if he dealt hands to 50 people and just claimed that one or two of them would get a good hand without prespecifying which ones it would be - well, he'd be rightly booed off the stage.

When researchers work with probabilities, they tend to see p-values as measures of the size and importance of a finding. However, as The Amazing Significo demonstrates, p-values can only be interpreted in the context of a whole experiment: unless you know about all the comparisons that have been made (corresponding to all the people who were dealt a hand) they are highly misleading.

In recent years, there has been growing interest in the phenomenon of p-hacking - selecting experimental data after doing the statistics to ensure a p-value below the conventional cutoff of .05. It is recognised as one reason for poor reproducibility of scientific findings, and it can take many forms.

I've become interested in one kind of p-hacking, use of what we term 'ghost variables' - variables that are included in a study but not reported unless they give a significant result. In a recent paper (preprint available here), Paul Thompson and I simulated the situation when a researcher has a set of dependent variables, but reports only those with p-values below .05. This would be like The Amazing Significo making a film of his performances in which he cut out all the cases where he dealt a poor hand**. It is easy to get impressive results if you are selective about what you tell people. If you have two groups of people who are equivalent to one another, and you compare them on just one variable, then the chance that you will get a spurious 'significant' difference (p < .05)  is 1 in 20. But with eight variables, the chance of a false positive 'significant' difference on any one variable is 1-.95^8, i.e. 1 in 3. (If variables are correlated these figures change: see our paper for more details).

Quite simply p-values are only interpretable if you have the full context: if you pull out the 'significant' variables and pretend you did not test the others, you will be fooling yourself - and other people - by mistaking chance fluctuations for genuine effects. As we showed with our simulations, it can be extremely difficult to detect this kind of p-hacking, even using statistical methods such as p-curve analysis, which were designed for this purpose. This is why it is so important to either specify statistical tests in advance (akin to predicting which people will get three of a kind), or else adjust p-values for the number of comparisons in exploratory studies*.

Unfortunately, there are many trained scientists who just don't understand this. They see a 'significant' p-value in a set of data and think it has to be meaningful. Anyone who suggests that they need to correct p-values to take into account the number of statistical tests - be they correlations in a correlation matrix, coefficients in a regression equation, or factors and interactions in Analysis of Variance, is seen as a pedantic killjoy (see also Cramer et al, 2015). The p-value is seen as a property of the variable it is attached to, and the idea that it might change completely if the experiment were repeated is hard for them to grasp.

This mass delusion can even extend to journal editors, as was illustrated recently by the COMPare project, the brainchild of Ben Goldacre and colleagues. This involves checking whether the variables reported in medical studies correspond to the ones that the researchers had specified before the study was done and informing journal editors when this was not the case. There's a great account of the project by Tom Chivers in this Buzzfeed article, which I'll let you read for yourself. The bottom line is that the editors of the Annals of Internal Medicine appear to be people who would be unduly impressed by The Amazing Significo because they don't understand what Geoff Cumming has called 'the dance of the p-values'.



*I am ignoring Bayesian approaches here, which no doubt will annoy the Bayesians


**PS.27th Jan 2016.  Marcus Munafo has drawn my attention to a film by Derren Brown called 'the System' which pretty much did exactly this! http://www.secrets-explained.com/derren-brown/the-system