Showing posts with label impact factor. Show all posts
Showing posts with label impact factor. Show all posts

Sunday, 10 March 2013

High-impact journals: where newsworthiness trumps methodology

Here’s a paradox: Most scientists would give their eye teeth to get a paper in a high impact journal, such as Nature, Science, or Proceedings of the National Academy of Sciences. Yet these journals have had a bad press lately, with claims that the papers they publish are more likely to be retracted than papers in journals with more moderate impact factors. It’s been suggested that this is because the high impact journals treat newsworthiness as an important criterion for accepting a paper. Newsworthiness is high when a finding is both of general interest and surprising, but surprising findings have a nasty habit of being wrong.

A new slant on this topic was provided recently by a paper by Tressoldi et al (2013), who compared the statistical standards of papers in high impact journals with those of three respectable but lower-impact journals. It’s often assumed that high impact journals have a very high rejection rate because they adopt particularly rigorous standards, but this appears not to be the case. Tressoldi et al focused specifically on whether papers reported effect sizes, confidence intervals, power analysis or model-fitting. Medical journals fared much better than the others, but Science and Nature did poorly on these criteria. Certainly my own experience squares with the conclusions of Tressoldi et al (2013), as I described in the course of discussion about an earlier blogpost.

Last week a paper appeared in Current Biology (impact factor = 9.65) with the confident title: “Action video games make dyslexic children read better.” It's a classic example of a paper that is on the one hand highly newsworthy, but on the other, methodologically weak. I’m not usually a betting person, but I’d be prepared to put money on the main effect failing to replicate if the study were repeated with improved methodology. In saying this, I’m not suggesting that the authors are in any way dishonest. I have no doubt that they got the results they reported and that they genuinely believe they have discovered an important intervention for dyslexia. Furthermore, I’d be absolutely delighted to be proved wrong: There could be no better news for children with dyslexia than to find that they can overcome their difficulties by playing enjoyable computer games rather than slogging away with books. But there are good reasons to believe this is unlikely to be the case.

An interesting way to evaluate any study is to read just the Introduction and Methods, without looking at Results and Discussion. This allows you to judge whether the authors have identified an interesting question and adopted an appropriate methodology to evaluate it, without being swayed by the sexiness of the results. For the Current Biology paper, it’s not so easy to do this, because the Methods section has to be downloaded separately as Supplementary Material. (This in itself speaks volumes about the attitude of Current Biology editors to the papers they publish: Methods are seen as much less important than Results). On the basis of just Introduction and Methods, we can ask whether the paper would be publishable in a reputable journal regardless of the outcome of the study.

On the basis of that criterion, I would argue that the Current Biology paper is problematic, purely on the basis of sample size. There were 10 Italian children aged 7 to 13 years in each of two groups: one group played ‘action’ computer games and the other was a control group playing non-action games (all games from Wii's Rayman Raving Rabbids - see here for examples). Children were trained for 9 sessions of 80 minutes per day over two weeks. Unfortunately, the study was seriously underpowered. In plain language, with a sample this small, even if there is a big effect of intervention, it would be hard to detect it. Most interventions for dyslexia have small-to-moderate effects, i.e. they improve performance in the treated group by .2 to .5 standard deviations. With 10 children per group, the power is less than .2, i.e. there’s a less than one in five chance of detecting a true effect of this magnitude. In clinical trials, it is generally recommended that the sample size be set to achieve power of around .8. This is only possible with a total sample of 20 children if the true effect of intervention is enormous – i.e. around 1.2 SD, meaning there would be little overlap between the two groups’ reading scores after intervention. Before doing this study there would have been no reason to anticipate such a massive effect of this intervention, and so use of only 10 participants per group was inadequate. Indeed, in the context of clinical trials, such a study would be rejected by many ethics committees (IRBs) because it would be deemed unethical to recruit participants for a study which had such a small chance of detecting a true effect.

But, I hear you saying, this study did find a significant effect of intervention, despite being underpowered. So isn’t that all the more convincing? Sadly, the answer is no. As Christley (2010) has demonstrated, positive findings in underpowered studies are particularly likely to be false positives when they are surprising – i.e., when we have no good reason to suppose that there will be a true effect of intervention. This seems particularly pertinent in the case of the Current Biology study – if playing active computer games really does massively enhance children’s reading, we might have expected to see a dramatic improvement in reading levels in the general population in the years since such games became widely available.

The small sample size is not the only problem with the Current Biology study. There are other ways in which it departs from the usual methodological requirements of a clinical trial: it is not clear how the assignment of children to treatments was made or whether assessment was blind to treatment status, no data were provided on drop-outs, on some measures there were substantial differences in the variances of the two groups, no adjustment appears to have been made for the non-normality of some outcome measures, and a follow-up analysis was confined to six children in the intervention group. Finally, neither group showed significant improvement in reading accuracy, where scores remained 2 to 3 SD below the population mean (Tables S1 and S3): the group differences were seen only for measures of reading speed.

Will any damage be done? Probably not much – some false hopes may be raised, but the stakes are not nearly as high as they are for medical trials, where serious harm or even death can result from wrong results. There is concern, however, that quite apart from the implications for families of children with reading problems, there is another issue here, about the publication policies of high-impact journals. These journals wield immense power. It is not overstating the case to say that a person’s career may depend on having a publication in a journal like Current Biology (see this account – published, as it happens, in Current Biology!). But, as the dyslexia example illustrates, a home in a high-impact journal is no guarantee of methodological quality. Perhaps this should not surprise us: I looked at the published criteria for papers on the websites of Nature, Science, PNAS and Current Biology. None of them mentioned the need for strong methodology or replicability; all of them emphasised “importance” of the findings.

Methods are not a boring detail to be consigned to a supplement: they are crucial in evaluating research. My fear is that the primary goal of some journals is media coverage, and consequently science is being reduced to journalism, and is suffering as a consequence.

References

Brembs, B., & Munafò, M. R. (2013). Deep impact: Unintended consequences of journal rank. arXiv:1301.3748.

Christley, R. M. (2010). Power and error: increased risk of false positive results in underpowered studies. The Open Epidemiology Journal, 3, 16-19.

Halpern, S. D.,  Karlawish, J. T, & Berlin, J. A. (2002). The continuing unethical conduct of underpowered clinical trials. Journal of the American Medical Association, 288(3), 358-362. doi: 10.1001/jama.288.3.358

Lawrence, P. A. (2007). The mismeasurement of science. Current Biology, 17(15), R583-R585. doi: 10.1016/j.cub.2007.06.014

Tressoldi, P., Giofré, D., Sella, F., & Cumming, G. (2013). High Impact = High Statistical Standards? Not Necessarily So. PLoS ONE, 8 (2) DOI: 10.1371/journal.pone.0056180

Saturday, 19 January 2013

Journal Impact Factors and REF 2014

In 2014, British institutions of Higher Education are to be evaluated in the Research Excellence Framework (REF), an important exercise on which their future funding depends. Academics are currently undergoing scrutiny by their institutions to determine whether their research outputs are good enough to be entered in the REF. Outputs are to be assessed in terms of  "‘originality, significance and rigour’, with reference to international research quality standards."
Here's what the REF2014 guidelines say about journal impact factors:

"No sub-panel will make any use of journal impact factors, rankings, lists or the perceived standing of publishers in assessing the quality of research outputs."

Here are a few sources that explain why it is a bad idea to use impact factors to evaluate individual research outputs:
Stephen Curry's blog
David Colquhoun letter to Nature
Manuscript by Brembs & Munafo on "Unintended consequences of journal rank"
Editage tutorial

Here is some evidence that the REF2014 statement on impact factors is being widely ignored:

Jenny Rohn Guardian blogpost

And here's a letter I wrote yesterday to the representatives of RCUK who act as observers on REF panels about this. I'll let you know if I get a reply.

18th January 2013

To: Ms Anne-Marie Coriat: Medical Research Council   
Dr Alf Game: Biotechnology and Biological Sciences Research Council   
Dr Alison Wall: Engineering and Physical Sciences Research Council   
Ms Michelle Wickendon: Natural Environment Research Council   
Ms Victoria Wright: Science and Technology Facilities Council   
Dr Fiona Armstrong: The Economic and Social Research Council    
Mr Gary Grubb: Arts and Humanities Research Council    


Dear REF2014 Observers,

I am contacting you because a growing number of academics are expressing concerns that, contrary to what is stated in the REF guidelines, journal impact factors are being used by some Universities to rate research outputs. Jennifer Rohn raised this issue here in a piece on the Guardian website last November:
http://www.guardian.co.uk/science/occams-corner/2012/nov/30/1


I have not been able to find any official route whereby such concerns can be raised, and I have evidence that some of those involved in the REF, including senior university figures and REF panel members, regard it as inevitable and appropriate that journal impact factors will be factored in to ratings - albeit as just one factor among others. Many, perhaps most, of the academics involved in panels and REF preparations grew up in a climate where publication in a high impact journal was regarded as the acme of achievement. Insofar as there are problems with the use of impact factors, they seem to think the only difficulty is the lack of comparability across sub-disciplines, which can be adjusted for. Indeed, I have been told that it is naïve to imagine that this statement should be taken literally: "No sub-panel will make any use of journal impact factors, rankings, lists or the perceived standing of publishers in assessing the quality of research outputs." 


Institutions seem to vary in how strictly they are interpreting this statement and this could lead to serious problems further down the line. An institution that played by the rules and submitted papers based only on perceived scientific quality might challenge the REF outcome if they found the panel had been basing ratings on journal impact factor. The evidence for such behaviour could be reconstructed from an analysis of outputs submitted for the REF.


I think it is vital that RCUK responds to the concerns raised by Dr Rohn to clarify the position on journal impact factors and explain the reasoning behind the guidelines on this. Although the statement seems unambiguous, there is a widespread view that the intention is only to avoid slavish use of impact factors as a sole criterion, not to ban their use altogether. If that is the case, then this needs to be made explicit. If not, then it would be helpful to have some mechanism whereby academics could report institutions that flout this rule.

Yours sincerely

(Professor) Dorothy Bishop


Reference
Colquhoun, D. (2003). Challenging the tyranny of impact factors Nature, 423 (6939), 479-479 DOI: 10.1038/423479a

P.S. 21/1/13
This post has provoked some excellent debate in the Comments, and also on Twitter. I have collated the tweets on Storify here, and the Comments are below. They confirm that there are very divergent views out there about whether REF panels are likely to, or should, use journal impact factor in any shape or form. They also indicate that this issue is engendering high levels of anxiety in many sections of academia.

P.P.S. 30/1/13
REPLY FROM HEFCE
I now have a response from Graeme Rosenberg, REF Manager at HEFCE, who kindly agreed that I could post relevant content from his email here. This briefly explains why impact factors are disallowed for REF panels, but notes that institutions are free to flout this rule in their submissions, at their own risk. The text follows:

I think your letter raises two sets of issues, which I will respond to in turn. 

The REF panel criteria state clearly that panels will not use journal impact factors in the assessment. These criteria were developed by the panels themselves and we have no reason to doubt they will be applied correctly. The four main panels will oversee the work of the sub-panels throughout the assessment process, and it part of the main panels' remit to ensure that all sub-panels apply the published criteria. If there happen to be some individual panel members at this stage who are unsure about the potential use of impact factors in the panels' assessments, the issue will be clarified by the panel chairs when the assessment starts. The published criteria are very clear and do not leave any room for ambiguity on this point. 

The question of institutions using journal impact factors in preparing their submissions is a separate issue. We have stated clearly what the panels will and will not be using to inform their judgements. But institutions are autonomous and ultimately it is their decision as to what forms of evidence they use to inform their selection decisions. If they choose to use journal impact factors as part of the evidence, then the evidence for their decisions will differ to that used by panels. This would no doubt increase the risk to the institution of reaching different conclusions to the REF panels. Institutions would also do well to consider why the REF panels will not use journal impact factors - at the level of individual outputs they are a poor proxy for quality. Nevertheless, it remains the institution's choice.