Showing posts with label REF2014. Show all posts
Showing posts with label REF2014. Show all posts

Thursday, 18 December 2014

Dividing up the pie in relation to REF2014

OK, I've only had an hour to look at REF results, so this will be brief, but I'm far less interested in league tables than in the question of how the REF results will translate into funding for different departments in my subject area, psychology.

I should start by thanking HEFCE, who are a model of efficiency and transparency: I was able to download a complete table of REF outcomes from their website here.

What I did was to create a table with just the Overall results for Unit of Assessment 4, which is Psychology, Psychiatry and Neuroscience (i.e. a bigger and more diverse grouping than for the previous RAE). These Overall results combine information from Outputs (65%), Impact (20%) and Environment (15%). I excluded institutions in Scotland, Wales and Northern Ireland.

Most of the commentary on the REF focuses on the so-called 'quality' rankings. These represent the average rating for an institution on a 4-point scale. Funding, however, will depend on the 'power' - i.e. the quality rankings multiplied by the number of 'full-time equivalent' staff entered in the REF. Not surprisingly, bigger departments get more money. The key things we don't yet know are (a) how much funding there will be, and (b) what formula will be used to translate the star ratings into funding.

With regard to (b), in the previous exercise, the RAE, you got one point for 2*, three points for 3* and seven points for 4*. It is anticipated that this time there will be no credit for 2* and little or no credit for 3*. I've simply computed the sums according to two scenarios: original RAE, and formula where only 4* counts. From these scores one can readily compute what percentage of available funding will go to each institution. The figures are below. Readers may find it of interest to look at this table in relation to my earlier blogpost on The Matthew Effect and REF2014.

Unit of Assessment 4:
Table showing % of subject funding for each institution depending on funding formula

Funding formula
Institution RAE 4*only
University College London 16.1 18.9
King's College London 13.3 14.5
University of Oxford 6.6 8.5
University of Cambridge 4.7 5.7
University of Bristol 3.6 3.8
University of Manchester 3.5 3.7
Newcastle University 3.0 3.4
University of Nottingham 2.7 2.6
Imperial College London 2.6 2.9
University of Birmingham 2.4 2.7
University of Sussex 2.3 2.4
University of Leeds 2.0 1.5
University of Reading 1.8 1.6
Birkbeck College 1.8 2.2
University of Sheffield 1.7 1.7
University of Southampton 1.7 1.8
University of Exeter 1.6 1.6
University of Liverpool 1.6 1.6
University of York 1.5 1.6
University of Leicester 1.5 1.0
Goldsmiths' College 1.4 1.0
Royal Holloway 1.4 1.5
University of Kent 1.4 1.0
University of Plymouth 1.3 0.8
University of Essex 1.1 1.1
University of Durham 1.1 0.9
University of Warwick 1.1 1.0
Lancaster University 1.0 0.8
City University London 0.9 0.5
Nottingham Trent University 0.9 0.7
Brunel University London 0.8 0.6
University of Hull 0.8 0.4
University of Surrey 0.8 0.5
University of Portsmouth 0.7 0.5
University of Northumbria 0.7 0.5
University of East Anglia 0.6 0.5
University of East London 0.6 0.5
University of Central Lancs 0.5 0.3
Roehampton University 0.5 0.3
Coventry University 0.5 0.3
Oxford Brookes University 0.4 0.2
Keele University 0.4 0.2
University of Westminster 0.4 0.1
Bournemouth University 0.4 0.1
Middlesex University 0.4 0.1
Anglia Ruskin University 0.4 0.1
Edge Hill University 0.3 0.2
University of Derby 0.3 0.2
University of Hertfordshire 0.3 0.1
Staffordshire University 0.3 0.2
University of Lincoln 0.3 0.2
University of Chester 0.3 0.2
Liverpool John Moores 0.3 0.1
University of Greenwich 0.3 0.1
Leeds Beckett University 0.2 0.0
Kingston University 0.2 0.1
London South Bank 0.2 0.1
University of Worcester 0.2 0.0
Liverpool Hope University 0.2 0.0
York St John University 0.1 0.1
University of Winchester 0.1 0.0
University of Chichester 0.1 0.0
University of Bolton 0.1 0.0
University of Northampton 0.0 0.0
Newman University 0.0 0.0


P.S. 11.20 a.m. For those who have excitedly tweeted from UCL and KCL about how they are top of the league, please note that, as I have argued previously, the principal determinant of the % projected funding is the number of FTE staff entered. In this case the correlation is .995.













Tuesday, 15 October 2013

The Matthew effect and REF2014


For unto every one that hath shall be given, and he shall have abundance: but from him that hath not shall be taken away even that which he hath. Matthew 25:29



So you’ve slaved over your departmental submission for REF2014, and shortly will be handing it in. A nervous few months await before the results are announced. You’ve sweated blood over deciding whether staff publications or impact statements will be graded as 1*, 2*, 3* or 4*, but it’s not possible to predict how the committee will judge them, nor, more importantly, how these ratings will translate into funding. In the last round of evaluation, in 2008, a weighted formula was used, such that a submission earned 1 point for every 2* output, 3 points for every 3* output, and 7 points for every 4* output. Rumour has it that this year there may be no money for 2* outputs and even more for 4*. It will be more complicated than this, because funding allocations will also take into account ratings of ‘impact statements’, and the ‘environment’.

I’ve blogged previously about concerns I have with the inefficiency of the REF2014 as a method for allocating funds. Today I want to look at a different issue: the extent to which the REF increases disparities between universities over time. To examine this, I created a simulation which made a few simple assumptions. We start with a sample of 100 universities, each of which is submitting 50 staff in a Unit of Assessment. At the outset, we start with all universities equal in terms of the research quality of their staff: they are selected at random from a pool of possible staff whose research quality is normally distributed. Funding is then allocated according to the formula used in RAE2008. The key feature of the simulation is that over every assessment period there is turnover of staff (estimated at 10% in simulation shown here), and universities with higher funding levels are able to recruit replacement staff with higher scores on the research quality scale. These new staff are then the basis for computing funding allocations in the next cycle – and so on, through as many cycles as one wishes. This simulation shows that funding starts out fairly normally distributed, but as we progress through each cycle, it becomes increasingly skewed, with the top-performers moving steadily away from the rest (Figure A). In the graphs, funding is shown over time for universities grouped in deciles, i.e., bands of 10 universities after ranking by funding level.
Simulation: Mean income for universities in each of 10 deciles over 6 funding cycles

Depending on specific settings of parameters in the model, we may even see a bimodal distribution developing over time: a large pool of ‘have-nots’ vs an elite group of ‘haves’. Despite the over-simplifications of the model, I would argue that it captures an essential feature of the current funding framework: funding goes to those who are successful, allowing them to enter a positive feedback loop whereby they can recruit more high-calibre researchers and become even more successful – and hence gain even more funds in the next round. For those who are unsuccessful, it can be hard to break out of a downward spiral into research inactivity.

We could do things differently. Figure B shows how tweaking the funding model could avoid opening up such a wide gulf between the richest and poorest, and retain a solid core of middle-ranking universities.
Simulation using linear weighting of * levels. Each line is average for institutions in a given decile
Figure C, on the other hand, shows how a formula that predominantly rewards 4* outputs (weighting of 1 for 3* and 7 for 4*, which is rumoured to be a possible formula used in REF2014). This would dramatically increase the gulf between the elite and other institutions.
Simulation where 4* outputs get favoured. Each line is average for institutions in a given decile
I’m sure people will have very different views about whether or not the consequences illustrated here are desirable. One argument is that it is best to concentrate our research strength in a few elite institutions. That way the UK will be able to compete with the rest of the world in University league tables. Furthermore, by pooling the brightest brains in places where they have the best resources to do research, we have a chance of making serious breakthroughs. We could even use biblical precedent to justify such an approach: the Matthew effect refers to the biblical parable of the talents, in which servants are entrusted different sums of money by their master, and those who have most make the best use of it. There is no sympathy for those with few resources: they fail to make good use of what they do have and end up cast out into outer darkness, where there is weeping and gnashing of teeth. This robust attitude characterises those who argue that only internationally outstanding research should receive serious funding.

However, given that finances are always limited, there will be a cost to the focus on an elite; the middle-ranking universities will get less funding, and be correspondingly less able to attract high-calibre researchers. And it could be argued that we don’t just need an elite: we need a reasonable number of institutions in which there is a strong research environment, where more senior researchers feel valued and their graduate students and postdocs are encouraged to aim high. Our best strategy for retaining international competitiveness might be by fostering those who are doing well but have potential to do even better. In any case, much research funding is awarded through competition for grants, and most of this goes to people in elite institutions, so these places will not be starved of income if we were to adopt a more balanced system of awarding central funds.

What worries me most is that I haven’t been able to find any discussion of this issue – namely, whether the goal of a funding formula should be to focus on elite institutions or distribute funds more widely. The nearest thing I’ve found so far is a paper analysing a parallel issue in grant awards (Fortin & Curry, 2013) – which comes to the conclusion that broader distribution of smaller grants is more effective than narrowly distributed large grants. Very soon, somebody somewhere is going to decide on the funding formula, and if rumours are to be believed, it will widen the gap between the haves and have-nots even further. I'm concerned that if we continue to concentrate funding only in those institutions with a high proportion of research superstars, we may be creating an imbalance in our system of funding that will be bad for UK research in the long run.

Reference  

Fortin JM, & Currie DJ (2013). Big Science vs. Little Science: How Scientific Impact Scales with Funding. PloS one, 8 (6) PMID: 23840323

Thursday, 12 September 2013

Evaluate, evaluate, evaluate


© www.CartoonStock.com
When I was starting out on a doctorate, I’d look at the senior people in my field and wonder if I’d ever be like them. It must be great, I thought, to reach the advanced age of 40. By then you’d have learned everything you needed to know to do great science, and you could just focus on doing it. I suspect today’s crop of grad students are a bit more savvy than I was, but all the same, I wonder if they realise just how wrong that picture is – for two reasons.

First, you never stop learning. The field moves on. Instead of getting easier, it gets harder. I remember when techniques such as functional brain imaging first came along. The most competent people in that area were either those who had developed the methods, or young people who learned them as grad students. If you were of the generation above, you had three choices: ignore the methods, spend time learning them, or hire junior people who knew what they were doing. As the methods evolve, they get ever more complex, and meanwhile, your own brain starts to shrink. So if you are anticipating making it to a tenured post and then settling down in your armchair, think again.

Second, the more senior you get, the more of your time is spent, not on doing your own research, but on evaluation. You learn that an email entitled ‘invitation’ should not make your spirits rise: it’s just a desperate attempt to put a positive spin on a request for you to do more work for no reward. You get regular ‘invitations’ to review papers and grants, write job references, appraise promotion bids, sit on interview panels and examine theses. If you are involved in teaching, you’ll also be engaged in numerous other forms of appraisal.

I was prompted to think about this when someone asked on an electronic forum what was a reasonable number of doctoral theses to examine each year. The general consensus was two: though it will obviously depend on what other commitments someone has. It also varies from country to country. There are some jolly places in Europe where a PhD viva is just an excuse for a boozy party with a lot of dressing up in funny gowns and hats. In UK psychology, the whole thing is no fun at all: you have to read a document of 50,000-70,000 words reporting a body of work based on a series of experimental studies. You then write a report on it and see the candidate for a face-to-face viva, which is typically 2 to 3 hours long. Although failure is uncommon, it is not assumed that the candidate will pass (unlike in the viva-as-party countries), and weeping or catatonic candidates are not unheard of. Taking into account travel, etc., if you are going to do a proper job, you are probably talking about three days’ work. For this you get paid around the minimum wage – the fee for examining is typically somewhere between £120 and £200.

So why do we do it? The major reason is because the entire academic enterprise depends on reciprocity: we want people to examine our students and review our papers and grants. In addition, it’s important to maintain standards, and to ensure that degrees, promotions, publications and grants go to those who merit them. But the demands keep growing. In the 37 weeks of this year I’ve been asked to review 76 papers and six grants. I agreed to review 16 papers and three of the grants. This, of course, is nothing compared with being a journal editor or serving on a grants board, something that most of us will do at some point.

Clearly, if I agreed to do everything I was asked, I’d have no time for anything else. Of course, one learns to say no. But awareness of these pressures has made me look with rather a critical eye at how we use evaluation. There is, for instance, research suggesting that job interviews aren’t very useful at identifying good candidates:  we tend to be seduced by immediate impressions, which may not be a good indicator of a person’s suitability. Like most people, I’d be reluctant to take on an employee I hadn’t interviewed, but if Daniel Kahneman is to be believed, this is just because I am a victim of the Illusion of Validity.

I’m a supporter of the peer review system used by journals, and here I feel  I’m on more solid ground, because I can point to instances where my papers have been improved by input from reviewers. Nevertheless, where reviewing is used simply to reject/accept papers or grant proposals,  and where fine-grained decisions have to be made between many high-quality submissions, agreement between experts may be little better than chance (e.g. Fogelholm et al, 2012). Nevertheless, we stick with it, because it’s hard to know what to put in its place.

I’ve written a fair bit about that expensive and time-consuming evaluation process that UK academics engage in, the REF. It requires experts to make judgements of whether, for instance, papers are of 3* or 4* quality, a distinction based on whether the research is “world leading” or “internationally excellent…. but falls short of the highest standards of excellence.” The reliability of such judgements has not, to my knowledge, been evaluated, yet large amounts of funding depend on them. Those on REF committees are in the same situation as Pavlov’s poor dogs, having to make distinctions that are on the one hand impossible (discriminating circles and ellipses that become increasingly similar) and on the other hand very important (get it wrong and you get a shock).

There is one good thing about doing so much evaluation. You have the opportunity to see what others are doing – you may be the first person to read an important new paper, or examine a ground-breaking thesis. You may be forced to engage with different ways of thinking, and confronted with new topics and ideas. You may be able to provide useful input to authors. And since you yourself will be evaluated, it can be useful to see life from the other side of the table, as the person doing the evaluating. But all too often, even these advantages fail to compensate for the fact that as a senior academic you will spend more and more time on evaluation of others and less and less doing your own research.

Reference
Fogelholm, Mikael, Leppinen, Saara, Auvinen, Anssi, Raitanen, Jani, Nuutinen, Anu, & Väänänen, Kalervo (2012). Panel discussion does not improve reliability of peer review for medical research grant proposals Journal of Clinical Epidemiology, 65 (1), 47-52 DOI: 10.1016/j.jclinepi.2011.05.001

Saturday, 26 January 2013

An alternative to REF2014?


After blogging last week about use of journal impact factors in REF2014, many people have asked me what alternative I'd recommend. Clearly, we need a transparent, fair and cost-effective method for distributing funding to universities to support research. Those designing the REF have tried hard over the years to devise such a method, and have explored various alternatives, but the current system leaves much to be desired.

Consider the current criteria for rating research outputs, designed by someone with a true flair for ambiguity:
Rating Definition
4* Quality that is world-leading in terms of originality, significance and rigour
3* Quality that is internationally excellent in terms of originality, significance and rigour but which falls short of the highest standards of excellence
2* Quality that is recognised internationally in terms of originality, significance and rigour
1* Quality that is recognised nationally in terms of originality, significance and rigour

Since only 4* and 3* outputs will feature in the funding formula, then a great deal hinges on whether research is deemed “world-leading”, “internationally excellent” or “internationally recognised”. This is hardly transparent or objective. That’s one reason why many institutions want to translate these star ratings into journal impact factors. But substituting a discredited, objective criterion for a subjective criterion is not a solution.

The use of bibliometrics was considered but rejected in the past. My suggestion is that we should reconsider this idea, but in a new version. A few months ago, I blogged about how university rankings in the previous assessment exercise (RAE) related to grant income and citation rates for outputs. Instead of looking at citations for individual researchers, I used Web of Science to compute an H-index for the period 2000-2007 for each department, by using the ‘address’ field to search. As noted in my original post, I did this fairly hastily and the method can get problematic in cases where a Unit of Assessment does not correspond neatly to a single department. The H-index reflected all research outputs of everyone at that address – regardless of whether they were still at the institution or entered for the RAE. Despite these limitations, the resulting H-index predicted the RAE results remarkably well, as seen in the scatterplot below, which shows H-index in relation to the funding level following from RAE. This is computed by number of full-time staff equivalents multiplied by the formula:
    .1 x 2* + .3  x 3* + .7 x 4*
(N.B. I ignored subject weighting, so units are arbitrary).

Psychology (Unit of Assessment 44), RAE2008 outcome by H-index
Yes, you might say, but the prediction is less successful at the top end of the scale, and this could mean that the RAE panels incorporated factors that aren’t readily measured by such a crude score as H-index. Possibly true, but how do we know those factors are fair and objective? In this dataset, one variable that accounted for additional variance in outcome, over and above departmental H-index, was whether the department had a representative on the psychology panel: if they did, then the trend was for the department to have a higher ranking than that predicted from the H-index. With panel membership included in the regression, the correlation (r) increased significantly from .84 to .86, t = 2.82, p = .006. It makes sense that if you are a member of a panel, you will be much more clued up than other people about how the whole process works, and you can use this information to ensure your department’s submission is strategically optimal. I should stress that this was a small effect, and I did not see it in a handful of other disciplines that I looked at, so it could be a fluke. Nevertheless, with the best intentions in the world, the current system can’t ever defend completely against such biases.

So overall, my conclusion is that we might be better off using a bibliometric measure such as a departmental H-index to rank departments. It is crude and imperfect, and I suspect it would not work for all disciplines – especially those in the humanities. It relies solely on citations, and it's debatable whether that is desirable. But for sciences, it seems to be pretty much measuring whatever the RAE was measuring, and it would seem to be the lesser of various possible evils, with a number of advantages compared to the current system. It is transparent and objective, it would not require departments to decide who they do and don’t enter for the assessment, and most importantly, it wins hands down on cost-effectiveness. If we'd used this method instead of the RAE, a small team of analysts armed with Web of Science should be able to derive the necessary data in a couple of weeks to give outcomes that are virtually identical to those of the RAE.  The money saved both by HEFCE and individual universities could be ploughed back into research. Of course, people will attempt to manipulate whatever criterion is adopted, but this one might be less easily gamed than some others, especially if self-citations from the same institution are excluded.

It will be interesting to see how well this method predicts RAE outcomes in other subjects, and whether it can also predict results from the REF2014, where the newly-introduced “impact statement” is intended to incorporate a new dimension into assessment.

Saturday, 19 January 2013

Journal Impact Factors and REF 2014

In 2014, British institutions of Higher Education are to be evaluated in the Research Excellence Framework (REF), an important exercise on which their future funding depends. Academics are currently undergoing scrutiny by their institutions to determine whether their research outputs are good enough to be entered in the REF. Outputs are to be assessed in terms of  "‘originality, significance and rigour’, with reference to international research quality standards."
Here's what the REF2014 guidelines say about journal impact factors:

"No sub-panel will make any use of journal impact factors, rankings, lists or the perceived standing of publishers in assessing the quality of research outputs."

Here are a few sources that explain why it is a bad idea to use impact factors to evaluate individual research outputs:
Stephen Curry's blog
David Colquhoun letter to Nature
Manuscript by Brembs & Munafo on "Unintended consequences of journal rank"
Editage tutorial

Here is some evidence that the REF2014 statement on impact factors is being widely ignored:

Jenny Rohn Guardian blogpost

And here's a letter I wrote yesterday to the representatives of RCUK who act as observers on REF panels about this. I'll let you know if I get a reply.

18th January 2013

To: Ms Anne-Marie Coriat: Medical Research Council   
Dr Alf Game: Biotechnology and Biological Sciences Research Council   
Dr Alison Wall: Engineering and Physical Sciences Research Council   
Ms Michelle Wickendon: Natural Environment Research Council   
Ms Victoria Wright: Science and Technology Facilities Council   
Dr Fiona Armstrong: The Economic and Social Research Council    
Mr Gary Grubb: Arts and Humanities Research Council    


Dear REF2014 Observers,

I am contacting you because a growing number of academics are expressing concerns that, contrary to what is stated in the REF guidelines, journal impact factors are being used by some Universities to rate research outputs. Jennifer Rohn raised this issue here in a piece on the Guardian website last November:
http://www.guardian.co.uk/science/occams-corner/2012/nov/30/1


I have not been able to find any official route whereby such concerns can be raised, and I have evidence that some of those involved in the REF, including senior university figures and REF panel members, regard it as inevitable and appropriate that journal impact factors will be factored in to ratings - albeit as just one factor among others. Many, perhaps most, of the academics involved in panels and REF preparations grew up in a climate where publication in a high impact journal was regarded as the acme of achievement. Insofar as there are problems with the use of impact factors, they seem to think the only difficulty is the lack of comparability across sub-disciplines, which can be adjusted for. Indeed, I have been told that it is naïve to imagine that this statement should be taken literally: "No sub-panel will make any use of journal impact factors, rankings, lists or the perceived standing of publishers in assessing the quality of research outputs." 


Institutions seem to vary in how strictly they are interpreting this statement and this could lead to serious problems further down the line. An institution that played by the rules and submitted papers based only on perceived scientific quality might challenge the REF outcome if they found the panel had been basing ratings on journal impact factor. The evidence for such behaviour could be reconstructed from an analysis of outputs submitted for the REF.


I think it is vital that RCUK responds to the concerns raised by Dr Rohn to clarify the position on journal impact factors and explain the reasoning behind the guidelines on this. Although the statement seems unambiguous, there is a widespread view that the intention is only to avoid slavish use of impact factors as a sole criterion, not to ban their use altogether. If that is the case, then this needs to be made explicit. If not, then it would be helpful to have some mechanism whereby academics could report institutions that flout this rule.

Yours sincerely

(Professor) Dorothy Bishop


Reference
Colquhoun, D. (2003). Challenging the tyranny of impact factors Nature, 423 (6939), 479-479 DOI: 10.1038/423479a

P.S. 21/1/13
This post has provoked some excellent debate in the Comments, and also on Twitter. I have collated the tweets on Storify here, and the Comments are below. They confirm that there are very divergent views out there about whether REF panels are likely to, or should, use journal impact factor in any shape or form. They also indicate that this issue is engendering high levels of anxiety in many sections of academia.

P.P.S. 30/1/13
REPLY FROM HEFCE
I now have a response from Graeme Rosenberg, REF Manager at HEFCE, who kindly agreed that I could post relevant content from his email here. This briefly explains why impact factors are disallowed for REF panels, but notes that institutions are free to flout this rule in their submissions, at their own risk. The text follows:

I think your letter raises two sets of issues, which I will respond to in turn. 

The REF panel criteria state clearly that panels will not use journal impact factors in the assessment. These criteria were developed by the panels themselves and we have no reason to doubt they will be applied correctly. The four main panels will oversee the work of the sub-panels throughout the assessment process, and it part of the main panels' remit to ensure that all sub-panels apply the published criteria. If there happen to be some individual panel members at this stage who are unsure about the potential use of impact factors in the panels' assessments, the issue will be clarified by the panel chairs when the assessment starts. The published criteria are very clear and do not leave any room for ambiguity on this point. 

The question of institutions using journal impact factors in preparing their submissions is a separate issue. We have stated clearly what the panels will and will not be using to inform their judgements. But institutions are autonomous and ultimately it is their decision as to what forms of evidence they use to inform their selection decisions. If they choose to use journal impact factors as part of the evidence, then the evidence for their decisions will differ to that used by panels. This would no doubt increase the risk to the institution of reaching different conclusions to the REF panels. Institutions would also do well to consider why the REF panels will not use journal impact factors - at the level of individual outputs they are a poor proxy for quality. Nevertheless, it remains the institution's choice.