Most academics loathe metrics. I’ve seldom attracted so much criticism as for my suggestion that a citation-based metric might be used to allocate funding to university departments. This suggestion was recycled this week in the Times Higher Education, after a group of researchers published predictions of REF2014 results based on departmental H-indices for four subjects.
Twitter was appalled. Philip Moriarty, in a much-retweeted plea said: “Ugh. *Please* stop giving credence to simplistic metrics like the h-index. V. damaging”. David Colquhoun, with whom I agree on many things, responded like an exorcist confronted with the spawn of the devil, arguing that any use of metrics would just encourage universities to pressurise staff to increase their H-indices.
Now, as I’ve explained before, I don’t particularly like metrics. In fact, my latest proposal is to drop both REF and metrics and simply award funding on the basis of the number of research-active people in a department. But I‘ve become intrigued by the loathing of metrics that is revealed whenever a metrics-based system is suggested, particularly since some of the arguments put forward do seem rather illogical.
Odd idea #1 is that doing a study relating metrics to funding outcomes is ‘giving credence’ to metrics. It’s not. What would give credence would be if the prediction of REF outcomes from H-index turned out to be very good. We already know that whereas it seems to give reasonable predictions for sciences, it’s much less accurate for humanities. It will be interesting to see how things turn out for the REF, but it’s an empirical question.
Odd idea #2 is that use of metrics will lead to gaming. Of course it will! Gaming will be a problem for any method of allocating money. The answer to gaming, though, is to be aware of how this might be achieved and to block obvious strategies, not to dismiss any system that could potentially be gamed. I suspect the H-index is less easy to game than many other metrics - though I’m aware of one remarkable case where a journal editor has garnered an impressive H-index from papers published in his own journals, with numerous citations to his own work. In general, though, those of us without editorial control are more likely to get a high H-index from publishing smaller amounts of high-quality science than churning out pot-boilers.
Odd idea #3 is the assumption that the REF’s system of peer review is preferable to a metric. At the HEFCE metrics meeting I attended last month, almost everyone was in favour of complex, qualitative methods of assessing research. David Colquhoun argued passionately that to evaluate research you need to read the publications. To disagree with that would be like slamming motherhood and apple pie. But, as Derek Sayer has pointed out, it is inevitable that the ‘peer review’ component of the REF will be flawed, given that panel members are required to evaluate several hundred submissions in a matter of weeks. The workload is immense and cannot involve the careful consideration of the content of books or journal articles, many of which will be outside the reader’s area of expertise.
My argument is a pragmatic one: we are currently engaged in a complex evaluation exercise that is enormously expensive in time and money, that has distorted incentives in academia, and that cannot be regarded as a ‘gold standard’. So, as an empirical scientist, my view is that we should be looking hard at other options, to see whether we might be able to achieve similar results in a more cost-effective way.
Different methods can be compared in terms of the final result, and also in terms of unintended consequences. For instance, in its current manifestation, the REF encourages universities to take on research staff shortly before the deadline – as satirised by Laurie Taylor (see Appointments section of this article). In contrast, if departments were rewarded for a high H-index, there would be no incentive for such behaviour. Also, staff members who were not principal investigators but who made valuable contributions to research would be appreciated, rather than threatened with redundancy. Use of an H-index would also avoid the invidious process of selecting staff for inclusion in the REF.
I suspect, anyhow, we will find predictions from the H-index are less good for REF than for RAE. One difficulty for Mryglod et al that it is not clear whether the Units of Assessment they base their predictions on will correspond to those used in REF. Furthermore, in REF, a substantial proportion of the overall score comes from impact, evaluated on the basis of case studies. To quote from the REF2014 website: “Case studies may include any social, economic or cultural impact or benefit beyond academia that has taken place during the assessment period, and was underpinned by excellent research produced by the submitting institution within a given timeframe.” My impression is that impact was included precisely to capture an aspect of academic quality that was orthogonal to traditional citation-based metrics, and so this should weaken any correlation of outcomes with H-index.
Be this as it may, I’m intrigued by people’s reactions to the H-index suggestion, and wondering whether this relates to the subject one works in. For those in arts and humanities, it is particularly self-evident that we cannot capture all the nuances of departmental quality from an H-index – and indeed, it is already clear that correlations between H-index and RAE outcomes are relatively low these disciplines. These academics work in fields where complex, qualitative analysis is essential. Interestingly, RAE outcomes in arts and humanities (as with other subjects) are pretty well predicted by departmental size, and it could be argued that this would be the most effective way of allocating funds.
Those who work in the hard sciences, on the other hand, take precision of measurement very seriously. Physicists, chemists and biologists, are often working with phenomena that can be measured precisely and unambiguously. Their dislike for an H-index might, therefore, stem from awareness of its inherent flaws: it varies with subject area and can be influenced by odd things, such as high citations arising from notoriety.
Psychologists, though, sit between these extremes. The phenomena we work with are complex. Many of us strive to treat them quantitatively, but we are used to dealing with measurements that are imperfect but ‘good enough’. To take an example from my own research. Years ago I wanted to measure the severity of children’s language problems, and I was using an elicitation task, where the child was shown pictures and asked to say what was happening. The test had a straightforward scoring system that gave indices of the maturity of the content and grammar of the responses. Various people, however, criticised this as too simple. I should take a spontaneous language sample, I was told, and do a full grammatical analysis. So, being young and impressionable I did. I ended up spending hours transcribing tape-recordings from largely silent children, and hours more mapping their utterances onto a complex grammatical chart. The outcome: I got virtually the same result from the two processes – one which took ten minutes and the other which took two days.
Psychologists evaluate their measures in terms of how reliable (repeatable) they are and how validly they do what they are supposed to do. My approach to the REF is the same as my approach to the rest of my work: try to work with measures that are detailed and complex enough to be valid for their intended purpose, but no more so. To work out whether a measure fits that bill, we need to do empirical studies comparing different approaches – not just rely on our gut reaction.
Friday, 28 November 2014
Friday, 24 October 2014
Blaming universities for our nation's woes
![]() |
|
©CartoonStock.com
|
In black below is the text of a comment piece in the Times Higher Education by Jamie Martin, advisor to Michael Gove, on Higher Education in the UK entitled “Must
Do Better”. In red are my thoughts on his arguments.
In an increasingly testing global race, Britain’s
competitive advantage must be built on education.
What is this ‘increasingly testing
global race’? Why should education be seen as part of an international
competition rather than a benefit to all humankind?
Times Higher Education’s World University Rankings show that
we have three of the world’s top 10 universities to augment our fast-improving
schools. Sustaining a competitive edge, however, requires constant improvement
and innovation. We must ask hard questions about our universities’ failures on
academic rigour and widening participation, and recognise the need for reform.
Well, this seems a rather confused
message. On the one hand, we are doing very well, but on the other hand we
urgently need to reform.
Too many higher education courses are of poor quality. When
in government, as special adviser to Michael Gove, I was shown an analysis
indicating that around half of student loans will never be repaid. Paul Kirby,
former head of the Number 10 Policy Unit, has argued that universities and
government are engaging in sub-prime lending, encouraging students to borrow
about £40,000 for a degree that will not return that investment. We lend money
to all degree students on equal terms, but employers don’t perceive all
university courses as equal. Taxpayers, the majority of whom have not been to
university, pick up the tab when this cruel lie is exposed.
So let’s get this right. The government
introduced a massive hike in tuition fees (£1,000
per annum in 1998, £3,000 p.a. in 2004, £9,000 p.a. in 2010). The idea was that
people would pay for these with loans which they would pay off when they were
earning above a threshold. It didn’t work because many people didn’t get
high-paying jobs and now it is estimated that 45% of loans won’t be repaid.
Whose fault is this? The universities! You might think the
inability of people to pay back loans is a consequence of lack of jobs due to
recession, but, no, the students would all be employable if only they had been taught
different things!
With the number of firsts doubling in a decade, we need an
honest debate about grade inflation and the culture of low lecture attendance
and light workloads it supports. Even after the introduction of tuition fees,
the Higher Education Policy Institute found that contact time averaged 14 hours
a week and degrees that were “more like a part-time than a full-time job”.
Unsurprisingly, many courses have tiny or even negative earnings premiums and
around half of recent graduates are in non-graduate jobs five years after
leaving.
An honest debate would be good. One that took into account the
conclusions of this
report by ONS which states: “Since the 2008/09 recession, unemployment
rates have risen for all groups but the sharpest rise was experienced by
non-graduates aged 21 to 30.” This
report does indeed note the 47% of recent graduates in non-graduate jobs, but
points out two factors that could contribute to the trend: the increased number
of graduates and decreased demand for graduate skills. There is no evidence
that employers are preferring non-graduates to graduates for skilled jobs:
rather there is a mismatch between the number of graduates and the number of skilled
jobs.
This is partly because the system lacks diversity. Too many
providers are weak imitations of the ancient universities. We have nothing to
rival the brilliant polytechnics I saw in Finland, while the development of
massive online open courses has been limited. The exciting New College of the
Humanities, a private institution with world-class faculty, is not eligible for
student loans. More universities should focus on a distinctive offer, such as
cheaper shorter degrees or high-quality vocational courses.
What an intriguing wish-list: Finnish
polytechnics, MOOCs, and the New College of the Humanities, which charges an
eye-watering £17,640 for full-time undergraduates in 2014-15. The latter might be seen as ‘exciting’ if you
are interested in the privatisation of the higher education sector, but for
those of us interested in educating the UK population, it seems more of an
irrelevance – likely to become a finishing school for the children of
oligarchs, rather than a serious contender for educating our populace.
If the failures on quality frustrate the mind, those on
widening participation perturb the heart. Each year, the c.75,000 families on
benefits send fewer students to Oxbridge than the c.100 families whose children
attend Westminster School. Alan Milburn’s Social Mobility and Child Poverty
Commission found that the most selective universities have actually become more
socially exclusive over the past decade.
Flawed admissions processes reinforce this inequality.
Evidence from the US shows that standardised test scores (the SAT), which are a
strong predictor of university grades, have a relatively low correlation with
socio-economic status. The high intelligence that makes you a great university
student is not the sole preserve of the social elite. The AS modules favoured
by university admissions officers have diluted A-level standards and are a poorer
indicator of innate ability than standardised tests. Universities still
prioritise performance in personal statements, Ucas forms and interviews, which
correlate with helicopter parents, not with high IQ.
Criticise their record on widening access, and universities
will blame the failures of the school system. Well, who walked on by while it
was failing? Who failed to speak out enough about the grade inflation that
especially hurt poorer pupils with no access to teachers who went beyond
weakened exams? Until Mark Smith, vice-chancellor of Lancaster University,
stepped forward, Gove’s decision to give universities control of A-level
standards met with a muted response.
Ah, this is interesting. After a
fulmination against social inequality in university admissions (well, at last a point I can agree on), Jamie Martin
notes that there is an argument that blames this on failures in the school
system. After all, if “The high intelligence that makes you a great university
student is not the sole preserve of the social elite”, why aren’t intelligent
children from working class backgrounds coming out of school with good
A-levels? Why are parents abandoning the state school system? Martin seems to
accept this is valid, but then goes on to argue that lower-SES students don’t get
into university because everyone has good A-levels (grade inflation) – and that’s
all the fault of universities for not ‘speaking out’. Is he really saying that if we had more discriminating A-levels, then the lower SES pupils would outperform private school pupils?
The first step in a prioritisation of education is to move
universities into an enlarged Department for Education after the general
election. The Secretary of State should immediately commission a genuinely
independent review to determine which degrees are a sound investment or of
strategic importance. Only these would be eligible for three-year student
loans. Some shorter loans might encourage more efficient courses. Those who
will brand this “philistinism” could not be more wrong: it is the traditional
academic subjects that are valued by employers (philosophy at the University of
Oxford is a better investment than many business courses). I am not arguing for
fewer people to go to university. We need more students from poorer backgrounds
taking the best degrees.
So, more reorganisation. And somehow, reducing
the number of courses for which you can get a student loan is going to increase
the number of students from poorer backgrounds who go to university. Just how
this magic is to be achieved remains unstated.
Government should publish easy-to-use data showing Treasury
forecasts on courses’ expected loan repayments, as well as quality factors such
as dropout rates and contact time. It should be made much easier to start a new
university or to remodel existing ones.
So here we come to the real agenda.
Privatisation of higher education.
Politicians and the Privy Council should lose all control of
higher education. Student choice should be the main determinant of which
courses and institutions thrive.
Erm, but two paragraphs back we were
told that student loans would only be available for those courses which were ‘a
sound investment or of strategic importance’.
Universities should adopt standardised entrance tests. And
just as private schools must demonstrate that they are worthy of their
charitable status, universities whose students receive loans should have to
show what action they are taking to improve state schools. The new King’s
College London Maths School, and programmes such as the Access Project charity,
are models to follow.
So it’s now the responsibility of
universities, rather than the DfE to improve state schools?
The past decade has seen a renaissance in the state school
system, because when tough questions were asked and political control reduced,
brilliant teachers and heads stepped forward. It is now the turn of
universities to make Britain the world’s leading education nation.
If there really has been a
renaissance, the social gradient should fix itself, because parents will
abandon expensive private education, and children will leave state schools with
a raft of good qualifications, regardless of social background. If only….
With his ‘must do better’ arguments,
Martin adopts a well-known strategy for those who wish to privatise public services:
first of all starve them of funds, then heap on criticism to portray the sector
as failing so that it appears that the only solution is to be taken over by a
free market. The NHS has been the focus
of such a campaign, and it seems that now the attention is shifting to higher
education. But here Martin has got a bit of a problem. As indicated in his
second sentence, we are actually doing surprisingly well, with our publicly-funded
universities competing favourably with the wealthy private universities in the
USA.
Sunday, 12 October 2014
Some thoughts on use of metrics in university research assessment
The UK’s Research Excellence Framework (REF) is like a walrus: it is huge, cumbersome and has a very long gestation period. Most universities started preparing in earnest for the REF early in 2011, with submissions being made late in 2013. Results will be announced in late December, just in time to cheer up our seasonal festivities.
Like many others, I have moaned about the costs of the REF: not just in money, but also the time spent by university staff, who could be more cheerfully and productively engaged in academic activities. The walrus needs feeding copious amounts of data: research outputs must be carefully selected and then graded in terms of research quality. Over the summer, those dedicated souls who sit on REF panels were required to read and evaluate several hundred papers. Come December, the walrus digestive system will have condensed the concerted ponderings of some of the best academic minds in the UK into a handful of rankings.
But is there a viable alternative? Last week I attended a fascinating workshop on the use of metrics in research. I had earlier submitted comments to an independent review of the role of metrics in research assessment from the Higher Education Funding Council for England (HEFCE), arguing that we need to consider cost-effectiveness when developing assessment methods. The current systems of evaluation have grown ever more complex and expensive, without anyone considering whether the associated improvements justified the increasing costs. My view is that an evaluation system need not be perfect – it just needs to be ‘good enough’ to provide a basis for disbursement of funds that can be seen to be both transparent and fair, and which does not lend itself readily to gaming.
Is there an alternative?
When I started preparing my presentation, I had intended to talk just about the use of measures of citations to rank departments, using analysis done for an earlier blogpost, as well as results from this paper by Mryglod et al. Both sources indicated that, at least in sciences, the ultimate quality-related research (QR) funding allocation for a department was highly correlated with a department-based measure of citations. So I planned to make the case that if we used a citation-based metric (which can be computed by a single person in a few hours) we could achieve much the same result as the full REF process for evaluating outputs, which takes many months and involves hundreds of people.
However, in pondering the data, I then realised that there was an even better predictor of QR funding per department: simply the number of staff entered into the REF process.
Before presenting the analysis, I need to backtrack to just explain the measures I am using, as this can get quite confusing. HEFCE deserves an accolade for its website, where all the relevant data can be found. My analyses were based on the 2008 Research Assessment Exercise (RAE). In what follows I used a file called QR funding and research volume broken down by institution and subject, which is downloadable here. This contains details of funding for each institution and subject for 2009-2010. I am sure the calculations I present here have been done much better by others and I hope they will not by shy to inform me if there are mistakes in my working.
The variables of interest are:
Size-related funding
To check this out, I computed an alternative metric, size-related funding, which multiplies the overall funds by the proportion of Nstaff in the department relative to total staff in that subject across all departments. So if across all departments in the subject there are 100 staff, a department with 10 staff would get .1 of the overall funds for the subject.
Table 1 shows: the correlation between Nstaff and QR funding (r QR/Nstaff) and how much a department would typically gain or lose if size-related funding were adopted, expressing the absolute difference as a percentage of QR funding (± % diff).
Table 1: Mean number of staff and QR funding by subject, with correlation between QR and N staff, and mean difference between QR funding and size-related funding
Reference
Mryglod, O., Kenna, R., Holovatch, Y., & Berche, B. (2013). Comparison of a citation-based indicator and peer review for absolute and specific measures of research-group excellence Scientometrics, 97 (3), 767-777 DOI: 10.1007/s11192-013-1058-9
Like many others, I have moaned about the costs of the REF: not just in money, but also the time spent by university staff, who could be more cheerfully and productively engaged in academic activities. The walrus needs feeding copious amounts of data: research outputs must be carefully selected and then graded in terms of research quality. Over the summer, those dedicated souls who sit on REF panels were required to read and evaluate several hundred papers. Come December, the walrus digestive system will have condensed the concerted ponderings of some of the best academic minds in the UK into a handful of rankings.
But is there a viable alternative? Last week I attended a fascinating workshop on the use of metrics in research. I had earlier submitted comments to an independent review of the role of metrics in research assessment from the Higher Education Funding Council for England (HEFCE), arguing that we need to consider cost-effectiveness when developing assessment methods. The current systems of evaluation have grown ever more complex and expensive, without anyone considering whether the associated improvements justified the increasing costs. My view is that an evaluation system need not be perfect – it just needs to be ‘good enough’ to provide a basis for disbursement of funds that can be seen to be both transparent and fair, and which does not lend itself readily to gaming.
Is there an alternative?
When I started preparing my presentation, I had intended to talk just about the use of measures of citations to rank departments, using analysis done for an earlier blogpost, as well as results from this paper by Mryglod et al. Both sources indicated that, at least in sciences, the ultimate quality-related research (QR) funding allocation for a department was highly correlated with a department-based measure of citations. So I planned to make the case that if we used a citation-based metric (which can be computed by a single person in a few hours) we could achieve much the same result as the full REF process for evaluating outputs, which takes many months and involves hundreds of people.
However, in pondering the data, I then realised that there was an even better predictor of QR funding per department: simply the number of staff entered into the REF process.
Before presenting the analysis, I need to backtrack to just explain the measures I am using, as this can get quite confusing. HEFCE deserves an accolade for its website, where all the relevant data can be found. My analyses were based on the 2008 Research Assessment Exercise (RAE). In what follows I used a file called QR funding and research volume broken down by institution and subject, which is downloadable here. This contains details of funding for each institution and subject for 2009-2010. I am sure the calculations I present here have been done much better by others and I hope they will not by shy to inform me if there are mistakes in my working.
The variables of interest are:
- The percentages of research falling in each star band in the RAE. From this, one can compute an average quality rating, by multiplying 4* by 7, 3* by 3, and 2* by 1 and adding these, and dividing the total by 100. Note that this figure is independent of department size and can be treated as an estimate of the average quality of a researcher in that department and subject.
- The number of full-time equivalent research-active staff entered for the RAE. This is labelled as the ‘model volume number’, but I will call it Nstaff. (In fact, the numbers given in the 2009-2010 spreadsheet are slightly different from those used in the computation, for reasons I am not clear about, but I have used the correct numbers, i.e. those in HEFCE tables from RAE2008).
- The departmental quality rating: this is average quality rating x Nstaff. (Labelled as “model quality-weighted volume” in the file). This is summed across all departments in a discipline to give a total subject quality rating (labelled as “total quality-weighted volume for whole unit of assessment”).
- The overall funds available for the subject are listed as “Model total QR quanta for whole unit of assessment (£)”. I have not been able to establish how this number is derived, but I assume it has to do with the size and cost of the subject, and the amount of funding available from government.
- QR (quality-related) funding is then derived by dividing the departmental quality rating by the total subject quality rating and multiplying by overall funds. This gives the sum of QR money allocated by HEFCE to that department for that year, which in 2009 ranged from just over £2K (Coventry University, Psychology) to over £12 million (UCL, Hospital-based clinical subjects). The total QR allocation in 2009-2010 for all disciplines was just over £1 billion.
- The departmental H-index is taken from my previous blogpost. It is derived by doing a Web of Knowledge search for articles from the departmental address, and then computing the H-index in the usual way. Note that this does not involve identifying individual scientists.
Size-related funding
To check this out, I computed an alternative metric, size-related funding, which multiplies the overall funds by the proportion of Nstaff in the department relative to total staff in that subject across all departments. So if across all departments in the subject there are 100 staff, a department with 10 staff would get .1 of the overall funds for the subject.
Table 1 shows: the correlation between Nstaff and QR funding (r QR/Nstaff) and how much a department would typically gain or lose if size-related funding were adopted, expressing the absolute difference as a percentage of QR funding (± % diff).
Table 1: Mean number of staff and QR funding by subject, with correlation between QR and N staff, and mean difference between QR funding and size-related funding
Correlations between Nstaff and QR funding are very high –above .9. Nevertheless, this analysis shows that, as is evident in Table 1, if we substituted size-related funding for QR funding, the amounts gained or lost by individual departments can be substantial. In some subjects, though, mainly in the Humanities, where overall QR allocations are anyhow quite modest, the difference between size-related and QR funding is not large in absolute terms. In such cases, it might be rational to allocate funds solely by Nstaff and ignore quality ratings. The advantage would be an enormous saving in time – one could bypass the RAE or REF entirely. This might be a reasonable option if the amount of expenditure on the RAE/REF by the department exceeds any potential gain from inclusion of quality ratings. Is the departmental H-index useful? If we assume that the goal is to have a system that approximates the outcomes of the RAE (and I’ll come back to that later) then for most subjects you need something more than Nstaff. The issue then is whether an easily computed department-based metric such as the H-index or total citations could add further predictive power. I looked at the figures for two subjects where I had computed the departmental H-index: Psychology and Physics. As it happens, Physics is an extreme case: the correlation between Nstaff and QR funding was .994. Adding an H-index does not improve prediction because there is virtually no variance left to explain. As can be seen from Table 1, Physics is a case where use of size-related funding might be justified, given that the difference between size-related and QR funding averages out at only 8%. For Psychology, adding the H-index to the regression explains a small but significant 6.2% of additional variance, with the correlation increasing to .95. But how much difference would it make in practice if we were to use these readily available measures to award funding instead of the RAE formula? The answer is more than you might think, and this is because the range in award size is so very large that even a small departure from perfect prediction can translate into a lot of money. Table 2 shows the different levels of funding that departments would accrue depending on how the funding formula is computed. The full table is too large and complex to show here, so I'll just show every 8th institution. As well as comparing alternative size-related and H-index-based (QRH) metrics with the RAE funding formula (QR0137), I have looked at how things change if the funding formula is tweaked: either to give more linear weighting to the different star categories (QR1234), or to give more extreme reward for the highest 4* category (QR0039) – something which is rumoured to be a preferred method for REF2014. In addition, I have devised a metric that has some parallels with the RAE metric, based on the residual of the H-index after removing effect of departmental size. This could be used as an index of quality that is independent of size; it correlates with r = .87 with the RAE average quality rating. To get an alternative QR estimate, it was substituted for the average quality rating in the funding formula to give the Size.Hres measure. Table 2: Funding results in £K from different metrics for seven Psychology departments representing different levels of QR funding
To avoid invidious comparisons, I have not labelled the departments, though anyone who is curious about their identity could discover them quite readily. The two columns that use the H-index tend to give similar results, and are closer to a QR funding based that treats the four star ratings as equal points on a scale (QR1234). It is also apparent that a move to QR0039 (where most reward is given for 4* research and none for 1* or 2*) will increase the share of funds to those institutions who are already doing well, and decrease it for those who already have poorer income under the current system. One can also see that some of the Universities at the lower end of the table – all of them post 1992 universities – seem disadvantaged by the RAE metric, in that the funding they received seems low relative to both their size and the H-index. The quest for a fair solution So what is a fair solution? Here, of course, lies the problem. There is no gold standard. There has been a lot of discussion about whether we should use metrics, but much less discussion of what we are hoping to achieve with a funding allocation. How about the idea that we could allocate funds simply on the basis of the number of research-active staff? In a straw poll I’ve taken, two concerns are paramount. First, there is a widely held view that we should give maximum rewards to those with highest quality research, because this will help them maintain their high standing, and incentivise others to do well. This is coupled with a view that we should not be rewarding those who don’t perform. But how extreme do we want this concentration of funding to be? I’ve expressed concerns before that too much concentration in a few elite institutions is not good for UK academia, and that we should be thinking about helping middle-ranking institution become elite, rather than focusing all our attention on those who have already achieved that status. The calculations from RAE in Table 2 show how a tweaking of the funding formula to give higher weighting to 4* research will take money from the poorer institutions and give it to the richer ones: it would be good to see some discussion of the rationale for this approach. The second source of worry is the potential for gaming. What is to stop a department from entering all their staff, or boosting numbers by taking on extra staff? The first point could be dealt with by having objective criteria for inclusion, such as some minimal number of first- or last-authored publications in the reporting period. The second strategy would be a risky one, since the institution would have to provide salaries and facilities for the additional staff, and this would only be cost-effective if the QR allocation would cover it. Of course, a really cynical gaming strategy would be to hire people briefly for the REF and then fire them once it is over. However, if funding were simply a function of number of research-active staff, it would be easy to do an assessment annually, to deter such short-term strategies. How about the departmental H-index? I have shown that it not only is a fairly good predictor of RAE QR funding outcomes on its own, incorporating as it does both aspects of departmental size and research quality, but it also correlates with the RAE measure of quality, once the effect of departmental size is adjusted for. This is all the more impressive when one notes that the departmental H-index is based on any articles listed as coming from the departmental address, whereas the quality rating is based just on those articles submitted to the RAE. There are well-rehearsed objections to the use of citation metrics such as the H-index: first any citation-based measure is useless for very recent articles. Second, citations vary from discipline to discipline, and in my own subject, Psychology, within sub-disciplines.. Furthermore, the H-index can be gamed to some extent by self-citation, or scientific cliques, and one way of boosting it is to insist on having your name on any publication you are remotely connected with - though the latter strategy is more likely to work for the H-index of the individual than for the H-index of the department. It is easy to find anecdotal instances of poor articles that are highly cited and good articles that are neglected. Nevertheless, it may be a ‘good enough’ measure when used in aggregate: not to judge individuals but to gauge the scientific influence of work coming from a given department over a period of a few years. The quest for a perfect measure of quality I doubt that either of these ‘quick and dirty’ indices will be adopted for future funding allocations, because it’s clear that most academics hate the idea of anything so simple. One message frequently voiced at the Sussex meeting was that quality is far too complex to be reduced to a single number. While I agree with that sentiment, I am concerned that in our attempts to get a perfect assessment method, we are developing systems that are ever more complex and time-consuming. The initial rationale for the RAE was that we needed a fair and transparent means of allocating funding after the 1992 shake-up of the system created many new universities. Over the years, there has been mission creep, and the purpose of the RAE has been taken over by the idea that we can and should measure quality, feeding an obsession with league tables and competition. My quest for something simpler is not because I think quality is simple, but rather because I think we should use the REF just as a means to allocate funds. If that is our goal, we should not reject simple metrics just because we find them oversimplistic: we should base our decisions on evidence and go for whatever achieves an acceptable outcome at reasonable cost. If a citation-based metric can do that job, then we should consider using it unless we can demonstrate that something else works better. I'd be very grateful for comments and corrections. |
Mryglod, O., Kenna, R., Holovatch, Y., & Berche, B. (2013). Comparison of a citation-based indicator and peer review for absolute and specific measures of research-group excellence Scientometrics, 97 (3), 767-777 DOI: 10.1007/s11192-013-1058-9
Labels:
H-index,
HEFCE,
higher education,
metrics,
RAE,
REF,
universities
Subscribe to:
Posts (Atom)

