Sunday, 12 October 2014

Some thoughts on use of metrics in university research assessment

The UK’s Research Excellence Framework (REF) is like a walrus: it is huge, cumbersome and has a very long gestation period. Most universities started preparing in earnest for the REF early in 2011, with submissions being made late in 2013. Results will be announced in late December, just in time to cheer up our seasonal festivities.
 
Like many others, I have moaned about the costs of the REF: not just in money, but also the time spent by university staff, who could be more cheerfully and productively engaged in academic activities. The walrus needs feeding copious amounts of data: research outputs must be carefully selected and then graded in terms of research quality. Over the summer, those dedicated souls who sit on REF panels were required to read and evaluate several hundred papers. Come December, the walrus digestive system will have condensed the concerted ponderings of some of the best academic minds in the UK into a handful of rankings.

But is there a viable alternative? Last week I attended a fascinating workshop on the use of metrics in research. I had earlier submitted comments to an independent review of the role of metrics in research assessment from the Higher Education Funding Council for England (HEFCE), arguing that we need to consider cost-effectiveness when developing assessment methods. The current systems of evaluation have grown ever more complex and expensive, without anyone considering whether the associated improvements justified the increasing costs. My view is that an evaluation system need not be perfect – it just needs to be ‘good enough’ to provide a basis for disbursement of funds that can be seen to be both transparent and fair, and which does not lend itself readily to gaming.

Is there an alternative?
When I started preparing my presentation, I had intended to talk just about the use of measures of citations to rank departments, using analysis done for an earlier blogpost, as well as results from this paper by Mryglod et al. Both sources indicated that, at least in sciences, the ultimate quality-related research (QR) funding allocation for a department was highly correlated with a department-based measure of citations. So I planned to make the case that if we used a citation-based metric (which can be computed by a single person in a few hours) we could achieve much the same result as the full REF process for evaluating outputs, which takes many months and involves hundreds of people.
However, in pondering the data, I then realised that there was an even better predictor of QR funding per department: simply the number of staff entered into the REF process.

Before presenting the analysis, I need to backtrack to just explain the measures I am using, as this can get quite confusing. HEFCE deserves an accolade for its website, where all the relevant data can be found. My analyses were based on the 2008 Research Assessment Exercise (RAE).  In what follows I used a file called QR funding and research volume broken down by institution and subject, which is downloadable here. This contains details of funding for each institution and subject for 2009-2010. I am sure the calculations I present here have been done much better by others and I hope they will not by shy to inform me if there are mistakes in my working.

The variables of interest are:
  • The percentages of research falling in each star band in the RAE. From this, one can compute an average quality rating, by multiplying 4* by 7, 3* by 3, and 2* by 1 and adding these, and dividing the total by 100. Note that this figure is independent of department size and can be treated as an estimate of the average quality of a researcher in that department and subject.
  • The number of full-time equivalent research-active staff entered for the RAE. This is labelled as the ‘model volume number’, but I will call it Nstaff. (In fact, the numbers given in the 2009-2010 spreadsheet are slightly different from those used in the computation, for reasons I am not clear about, but I have used the correct numbers, i.e. those in HEFCE tables from RAE2008).
  • The departmental quality rating: this is average quality rating x Nstaff. (Labelled as “model quality-weighted volume” in the file). This is summed across all departments in a discipline to give a total subject quality rating (labelled as “total quality-weighted volume for whole unit of assessment”).
  • The overall funds available for the subject are listed as “Model total QR quanta for whole unit of assessment (£)”. I have not been able to establish how this number is derived, but I assume it has to do with the size and cost of the subject, and the amount of funding available from government.
  • QR (quality-related) funding is then derived by dividing the departmental quality rating by the total subject quality rating and multiplying by overall funds. This gives the sum of QR money allocated by HEFCE to that department for that year, which in 2009 ranged from just over £2K (Coventry University, Psychology) to over £12 million (UCL, Hospital-based clinical subjects). The total QR allocation in 2009-2010 for all disciplines was just over £1 billion.
  • The departmental H-index is taken from my previous blogpost. It is derived by doing a Web of Knowledge search for articles from the departmental address, and then computing the H-index in the usual way. Note that this does not involve identifying individual scientists.
Readers who are still with me may have noticed that we'd expect QR funding for a subject to be correlated with Nstaff, because Nstaff features in the formula for computing QR funding. And this makes sense, because departments with more research staff require greater levels of funding. A key question is just how much difference does it make to the QR allocation if one includes the quality ratings from the RAE in the formula.

Size-related funding
To check this out, I computed an alternative metric, size-related funding, which multiplies the overall funds by the proportion of Nstaff in the department relative to total staff in that subject across all departments. So if across all departments in the subject there are 100 staff, a department with 10 staff would get .1 of the overall funds for the subject.

Table 1 shows: the correlation between Nstaff and QR funding (r QR/Nstaff) and how much a department would typically gain or lose if size-related funding were adopted, expressing the absolute difference as a percentage of QR funding (± % diff).

Table 1: Mean number of staff and QR funding by subject, with correlation between QR and N staff, and mean difference between QR funding and size-related funding





Mean Mean r QR/ ± %
Subject Nstaff QR £K Nstaff diff
Cardiovascular Medicine 26.3 794 0.906 23
Cancer Studies 38.1 1,330 0.939 13
Infection and Immunology 43.7 1,506 0.971 22
Other Hospital Based Clinical Subjects 58.2 1,945 0.986 23
Other Laboratory Based Clinical Subjects 21.8 685 0.952 41
Epidemiology and Public Health 26.6 949 0.986 25
Health Services Research 21.9 659 0.900 24
Primary Care & Community Based Clinical  10.4 370 0.790 29
Psychiatry, Neuroscience & Clinical Psychology 46.7 1,402 0.987 15
Dentistry 31.1 1,146 0.977 13
Nursing and Midwifery 18.0 487 0.930 32
Allied Health Professions and Studies 20.4 424 0.884 36
Pharmacy 27.5 899 0.936 24
Biological Sciences 45.1 1,649 0.978 19
Pre-clinical and Human Biological Sciences 49.4 1,944 0.887 18
Agriculture, Veterinary and Food Science 33.2 999 0.976 21
Earth Systems and Environmental Sciences 28.6 1,128 0.971 14
Chemistry 37.9 1,461 0.969 18
Physics 44.0 1,596 0.994 8
Pure Mathematics 18.4 489 0.957 24
Applied Mathematics 20.0 614 0.988 19
Statistics and Operational Research 12.6 406 0.953 19
Computer Science and Informatics 22.9 769 0.954 26
Electrical and Electronic Engineering 23.8 892 0.982 17
General Engineering; Mineral/Mining Engineering 28.9 1,073 0.958 30
Chemical Engineering 26.6 1,162 0.968 15
Civil Engineering 23.2 1,005 0.960 19
Mech., Aeronautical, Manufacturing Engineering 35.7 1,370 0.987 14
Metallurgy and Materials 21.1 807 0.948 24
Architecture and the Built Environment 18.7 436 0.961 23
Town and Country Planning 15.1 306 0.911 27
Geography and Environmental Studies 22.8 505 0.969 21
Archaeology 20.7 518 0.990 12
Economics and Econometrics 25.7 581 0.968 20
Accounting and Finance 11.7 156 0.982 19
Business and Management Studies 38.7 630 0.964 27
Library and Information Management 16.3 244 0.935 26
Law 26.6 426 0.960 30
Politics and International Studies 22.4 333 0.955 31
Social Work and Social Policy & Administration 19.1 324 0.944 26
Sociology 24.1 404 0.933 24
Anthropology 18.6 363 0.946 12
Development Studies 21.7 368 0.936 25
Psychology 21.1 424 0.919 35
Education 21.0 346 0.983 34
Sports-Related Studies 13.5 231 0.952 37
American Studies and Anglophone Area Studies 10.9 191 0.988 11
Middle Eastern and African Studies 17.7 393 0.978 17
Asian Studies 15.9 258 0.938 26
European Studies 20.1 253 0.787 30
Russian, Slavonic and East European Languages 8.7 138 0.973 22
French 12.6 195 0.979 16
German, Dutch and Scandinavian Languages 8.4 129 0.966 17
Italian 6.3 111 0.865 20
Iberian and Latin American Languages 9.1 156 0.937 17
Celtic Studies 0.0 328

English Language and Literature 20.9 374 0.982 26
Linguistics 11.7 168 0.956 18
Classics, Ancient History, Byzantine and Modern Greek Studies 19.4 364 0.992 22
Philosophy 14.4 258 0.987 23
Theology, Divinity and Religious Studies 11.4 174 0.958 32
History 20.8 366 0.988 21
Art and Design 22.7 419 0.955 37
History of Art, Architecture and Design 10.7 213 0.960 18
Drama, Dance and Performing Arts 9.8 221 0.864 36
Communication, Cultural and Media Studies 11.9 195 0.860 29
Music 10.6 259 0.863 33

Correlations between Nstaff and QR funding are very high –above .9. Nevertheless, this analysis shows that, as is evident in Table 1, if we substituted size-related funding for QR funding, the amounts gained or lost by individual departments can be substantial.  In some subjects, though, mainly in the Humanities, where overall QR allocations are anyhow quite modest, the difference between size-related and QR funding is not large in absolute terms. In such cases, it might be rational to allocate funds solely by Nstaff and ignore quality ratings.  The advantage would be an enormous saving in time – one could bypass the RAE or REF entirely. This might be a reasonable option if the amount of expenditure on the RAE/REF by the department exceeds any potential gain from inclusion of quality ratings.

Is the departmental H-index useful?
If we assume that the goal is to have a system that approximates the outcomes of the RAE (and I’ll come back to that later) then for most subjects you need something more than Nstaff. The issue then is whether an easily computed department-based metric such as the H-index or total citations could add further predictive power. I looked at the figures for two subjects where I had computed the departmental H-index: Psychology and Physics.  As it happens, Physics is an extreme case: the correlation between Nstaff and QR funding was .994. Adding an H-index does not improve prediction because there is virtually no variance left to explain. As can be seen from Table 1, Physics is a case where use of size-related funding might be justified, given that the difference between size-related and QR funding averages out at only 8%.

For Psychology, adding the H-index to the regression explains a small but significant 6.2% of additional variance, with the correlation increasing to .95.

But how much difference would it make in practice if we were to use these readily available measures to award funding instead of the RAE formula? The answer is more than you might think, and this is because the range in award size is so very large that even a small departure from perfect prediction can translate into a lot of money.

Table 2 shows the different levels of funding that departments would accrue depending on how the funding formula is computed. The full table is too large and complex to show here, so I'll just show every 8th institution. As well as comparing alternative size-related and H-index-based (QRH) metrics with the RAE funding formula (QR0137), I have looked at how things change if the funding formula is tweaked: either to give more linear weighting to the different star categories (QR1234), or to give more extreme reward for the highest 4* category (QR0039) – something which is rumoured to be a preferred method for REF2014. In addition, I have devised a metric that has some parallels with the RAE metric, based on the residual of the H-index after removing effect of departmental size. This could be used as an index of quality that is independent of size; it correlates with r = .87 with the RAE average quality rating. To get an alternative QR estimate, it was substituted for the average quality rating in the funding formula to give the Size.Hres measure.

Table 2: Funding results in £K from different metrics for seven Psychology departments representing different levels of QR funding


institution QR0137 Size-related QR1234 QR0039 QRH Size.Hres
A 1891 1138 1424 2247 1416 1470
B 812 585 683 899 698 655
C 655 702 688 620 578 576
D 405 363 401 400 499 422
E 191 323 276 121 279 304
F 78 192 140 44 299 218
G 26 161 81 13 60 142

To avoid invidious comparisons, I have not labelled the departments, though anyone who is curious about their identity could discover them quite readily.  The two columns that use the H-index tend to give similar results, and are closer to a QR funding based that treats the four star ratings as equal points on a scale (QR1234). It is also apparent that a move to QR0039 (where most reward is given for 4* research and none for 1* or 2*) will increase the share of funds to those institutions who are already doing well, and decrease it for those who already have poorer income under the current system. One can also see that some of the Universities at the lower end of the table – all of them post 1992 universities – seem disadvantaged by the RAE metric, in that the funding they received seems low relative to both their size and the H-index.

The quest for a fair solution
So what is a fair solution? Here, of course, lies the problem. There is no gold standard. There has been a lot of discussion about whether we should use metrics, but much less discussion of what we are hoping to achieve with a funding allocation.

How about the idea that we could allocate funds simply on the basis of the number of research-active staff? In a straw poll I’ve taken, two concerns are paramount.

First, there is a widely held view that we should give maximum rewards to those with highest quality research, because this will help them maintain their high standing, and incentivise others to do well. This is coupled with a view that we should not be rewarding those who don’t perform. But how extreme do we want this concentration of funding to be? I’ve expressed concerns before that too much concentration in a few elite institutions is not good for UK academia, and that we should be thinking about helping middle-ranking institution become elite, rather than focusing all our attention on those who have already achieved that status. The calculations from RAE in Table 2 show how a tweaking of the funding formula to give higher weighting to 4* research will take money from the poorer institutions and give it to the richer ones: it would be good to see some discussion of the rationale for this approach.

The second source of worry is the potential for gaming. What is to stop a department from entering all their staff, or boosting numbers by taking on extra staff? The first point could be dealt with by having objective criteria for inclusion, such as some minimal number of first- or last-authored publications in the reporting period.  The second strategy would be a risky one, since the institution would have to provide salaries and facilities for the additional staff, and this would only be cost-effective if the QR allocation would cover it. Of course, a really cynical gaming strategy would be to hire people briefly for the REF and then fire them once it is over. However, if funding were simply a function of number of research-active staff, it would be easy to do an assessment annually, to deter such short-term strategies.

How about the departmental H-index? I have shown that it not only is a fairly good predictor of RAE QR funding outcomes on its own, incorporating as it does both aspects of departmental size and research quality, but it also correlates with the RAE measure of quality, once the effect of departmental size is adjusted for. This is all the more impressive when one notes that the departmental H-index is based on any articles listed as coming from the departmental address, whereas the quality rating is based just on those articles submitted to the RAE.

There are well-rehearsed objections to the use of citation metrics such as the H-index: first any citation-based measure is useless for very recent articles. Second, citations vary from discipline to discipline, and in my own subject, Psychology, within sub-disciplines.. Furthermore, the H-index can be gamed to some extent by self-citation, or scientific cliques, and one way of boosting it is to insist on having your name on any publication you are remotely connected with - though the latter strategy is more likely to work for the H-index of the individual than for the H-index of the department. It is easy to find anecdotal instances of poor articles that are highly cited and good articles that are neglected.  Nevertheless, it may be a ‘good enough’ measure when used in aggregate: not to judge individuals but to gauge the scientific influence of work coming from a given department over a period of a few years.

The quest for a perfect measure of quality
I doubt that either of these ‘quick and dirty’ indices will be adopted for future funding allocations, because it’s clear that most academics hate the idea of anything so simple. One message frequently voiced at the Sussex meeting was that quality is far too complex to be reduced to a single number.  While I agree with that sentiment, I am concerned that in our attempts to get a perfect assessment method, we are developing systems that are ever more complex and time-consuming. The initial rationale for the RAE was that we needed a fair and transparent means of allocating funding after the 1992 shake-up of the system created many new universities. Over the years, there has been mission creep, and the purpose of the RAE has been taken over by the idea that we can and should measure quality, feeding an obsession with league tables and competition. My quest for something simpler is not because I think quality is simple, but rather because I think we should use the REF just as a means to allocate funds. If that is our goal, we should not reject simple metrics just because we find them oversimplistic: we should base our decisions on evidence and go for whatever achieves an acceptable outcome at reasonable cost. If a citation-based metric can do that job, then we should consider using it unless we can demonstrate that something else works better.

I'd be very grateful for comments and corrections.

Reference  
Mryglod, O., Kenna, R., Holovatch, Y., & Berche, B. (2013). Comparison of a citation-based indicator and peer review for absolute and specific measures of research-group excellence Scientometrics, 97 (3), 767-777 DOI: 10.1007/s11192-013-1058-9

Friday, 26 September 2014

Why most scientists don't take Susan Greenfield seriously


©CartoonStock.com


Three years ago I wrote an open letter to Susan Greenfield, asking her to please stop claiming there is a link between autism and use of digital media. It’s never pleasant criticizing a colleague, and since my earlier blogpost I’ve held back from further comment, hoping that she might refrain from making claims about autism, and/or that interest in her views would just die down. But now she's back, reiterating the claims in a new book and TV interview, and I can remain silent no longer.

Greenfield featured last week as the subject of a BBC interview in the series Hard Talk. The interviewer, Stephen Sackur, asked her specifically if she really believed her claims that exposure to modern digital media – the internet, video games, social media – were damaging to children’s development. Greenfield stressed that she did: although she herself had not done direct research on the internet/brain impact link, there was ample research to persuade her it was real. Specifically, she stated: “.. in terms of the evidence, anyone is welcome to look at my website, and it’s been up there for the last year. There’s 500 peer-reviewed papers in support of the possible problematic effects.”

A fact-check on the “500 peer-reviewed papers”

So I took a look. The list can be downloaded from here: it’s not exactly a systematic review. I counted 395 distinct items, but only a small proportion are peer-reviewed papers that find evidence of adverse effects from digital technology. There are articles from the Daily Mail and reports by pressure groups. There are some weird things that seem to have found their way onto the list by accident, such as a report on the global tobacco epidemic, and another from Department of Work and Pensions on differences in life expectancy for 20-, 50- and 80-year-olds. I must confess I did not read these cover to cover, but a link with 'mind change' was hard to see. Of the 234 peer-reviewed papers, some are reports on internet trends that contain nothing about adverse consequences, some are straightforward studies of neuroplasticity that don’t feature the internet, and others are of uncertain relevance. Overall, there were 168 papers that were concerned with effects of digital technology on behaviour and 15 concerned with effects on the brain. Furthermore, a wide range of topics was included: internet addiction, Facebook and social relations, violent games and aggression, reading on screens vs books, cyberbullying, ‘brain training’ and benefits for visuospatial skills, effects of multitasking on attention. I could only skim titles and a few abstracts, but I did not come away feeling there was overwhelming evidence of adverse consequences of these new technologies. Rather, papers covered a mix of risks and benefits with varying quality of evidence. There is, for instance, a massive literature on Facebook influences on self-esteem and social networks, but much of it talks of benefits. The better studies also noted the difficulties of inferring causation from correlational data: for instance, it’s possible that an addictive attitude to a computer game is as much a consequence as a cause of problems with everyday life.

Greenfield’s specific contribution to this topic is to link it up with what we know about neuroplasticity, and she has speculated that attentional mechanisms may be disrupted by effects that games have on neurotransmitter levels, that empathy and social relationships can be damaged when computers/games take us away from interacting with people, and that too much focus on a two-dimensional screen may affect perceptual and cognitive development in children. This is all potentially important and a worthy topic for research, but is it reasonable, as she has done, to liken the threat to that posed by climate change? As Stephen Sackur pointed out, the evidence from neuroplasticity would indicate that if the brain changes in response to its environment, then we should be able to reverse an effect by a change in environment. I cannot resist also pointing out that if it is detrimental to perform socially-isolated activities with a two-dimensional surface rather than interacting with real people in a 3D world, then we should be discouraging children from reading books.

Digital media use as a risk factor for autism

My main concern is the topic that motivated me to write to Greenfield in the first place: autism. The arguments I put forward in 2011 still stand: it is simply irresponsible to indulge in scaremongering on the basis of scanty evidence, particularly when the case lacks logical consistency.

In the Hard Talk interview*, Greenfield attempted to clarify her position: “You have to be careful, because what I say is autistic spectrum disorder. That’s not the same as autism.” Yet this is no clarification at all, given that the latest edition of DSM5 states: “Individuals with a well-established DSM-IV diagnosis of autistic disorder, Asperger’s disorder, or pervasive developmental disorder not otherwise specified should be given the diagnosis of autism spectrum disorder (ASD).” Greenfield has had a few years to check her facts, yet seems to be under the impression that ASD is some kind of mild impairment like social gaucheness, quite distinct from a clinically significant condition.

In an interview in the Observer (see here**), Greenfield was challenged by the interviewer, Andrew Anthony, who mentioned my earlier plea to her to stop talking about autism. She replied to say that she was not alone in making the link and that there were published papers making the same case. She recommended that if I wanted to dissent, I should “slug it out” with the authors of those papers. That’s an invitation too good to resist, so I searched the list from her website to find any that mentioned autism. There were four (see reference list below):

We need not linger on the Hertz-Picciotto & Delwiche paper, because it focuses on changes in rates of autism diagnosis and does not mention internet use or screen time. The rise is a topic of considerable interest about which a great deal has been written, and numerous hypotheses have been put forward to explain it. Computer use is not generally seen as a plausible hypothesis because symptoms of ASD are typically evident by 2 years of age, long before children are introduced to computers. (Use of tablets with very young children is increasing, but would not have been a factor for the time period studied, 1990-2006).

The Finkenauer et al paper is a study of internet use, and compulsive internet use by married couples, who were assessed using self-report questionnaires. Frequency of internet use was not related to autistic traits, but compulsive internet use was. The authors did not conclude that internet use causes autistic traits – that would be a bit weird in a sample of adults who grew up before the internet was widespread. Instead, they note that if you have autistic traits, there is an increased likelihood that internet use could become problematic. The paper is cautious in its conclusions and does not support Greenfield’s thesis that the internet is a risk factor for autism. On the contrary, it emphasises the possibility that people who develop an addictive relationship with the internet may differ from others in pre-existing personality traits.

So on to Waldman et al, who consider whether television causes autism. Yes, that’s right, this is not about internet use. It’s about the humble TV. Next thing to note is this is an unpublished report, and not a peer-reviewed paper. So I checked out the authors to see if they had published anything on this, and found an earlier paper with the intriguing title: “Autism Prevalence and Precipitation Rates in California, Oregon, and Washington Counties”. Precipitation? Like, rainfall? Yup! The authors did a regression analysis and concluded that there was a statistically significant association between the amount of rainfall in a specific county, and the frequency of autism diagnoses. They then went on to consider why this might be, and came up with an ingenious explanation: when it is wet, children can’t play outside. So they watch TV. And develop autism.

In the unpublished report, the theme is developed further, by linking rate of precipitation to household subscription to cable TV. The conclusion:

“Our precipitation tests indicate that just under forty percent of autism diagnoses in the three states studied is the result of television watching due to precipitation, while our cable tests indicate that approximately seventeen percent of the growth in autism in California and Pennsylvania during the 1970s and 1980s is due to the growth of cable television.”

One can only breathe a sigh of relief that no peer-reviewed journal appears to have been willing to publish this study.

But wait, there is one more study in the list provided by Greenfield. Will this be the clincher? It's by Maxson McDowell a Jungian therapist who uses case descriptions to formulate a hypothesis that relates autism to “failure to acquire, or retain, the image of the mother’s eyes”. I was initially puzzled at inclusion of this paper, because the published version blames non-maternal childcare rather than computers, but there is an updated version online which does make a kind of link – though again not with the internet: “The image-of-the-eyes hypotheses suggest that this increase [in autism diagnoses] may be due to the increased use, in early infancy, of non-maternal childcare including television and video.” So, no data, just anecdote and speculation designed to make working mothers feel it’s their fault that their child has autism.

Greenfield's research track record

Stephen Sackur asked Greenfield why, if she thought this topic so important, she hadn’t done research on this topic herself. She replied that as a neuroscientist, she couldn't do everything, that research costs money, and that if someone would like to give her some money, she could do such research.

But someone did give her some money. According to this website, in 2005 she received an award of $2 million from the Templeton Foundation to form the Oxford Centre for Science of the Mind which is “dedicated to cutting-edge interdisciplinary work drawing on pharmacology, human anatomy, physiology, neuroscience, theology and philosophy". A description of the research that would be done by the centre can be found here. Most scientists will have experienced failure to achieve all of the goals that they state in their grant proposals – there are numerous factors outside one's control that can mess up the best-laid plans. Nevertheless, the mismatch between what is promised on the website and evidence of achievement through publications is striking, and perhaps explains why further funding has apparently not been forthcoming.

One of the more surprising comments by Greenfield was when Sakur mentioned criticism of her claims by Ben Goldacre. “He’s not a scientist,” she retorted, “he’s a journalist”. Twitter went into a state of confusion, wondering whether this was a deliberate insult or pure ignorance. Goldacre himself tweeted: “My publication rate is not stellar, as a part time early career researcher transferring across from clinical medicine, but I think even my peer reviewed publication rate is better than Professor Greenfield's over the past year.”

This is an interesting point. The media repeatedly describe Greenfield as a “leading neuroscientist”, yet this is not how she is currently perceived among her peer group. In science, you establish your reputation by publishing in the peer-reviewed literature. A Web of Science search for the period 2010-2014 found thirteen papers in peer-reviewed journals authored or co-authored by Greenfield, ten of which reported new empirical data. This is not negligible, but for a five-year period, it is not stellar - and represents a substantial fall-off from her earlier productivity.

But quality is more important than quantity, and maybe, you think, her work is influential in the field. To check that out, I did a Web of Science search for papers published from a UK address between 2005-2014 with topic specified as (Alzheimer* OR Parkinson’s OR neurodegener*) AND brain. (The * is wildcard, so this will capture all words starting this way). I used a 10-year period because citations (a rough measure of how influential the work is) take time to accrue. This yielded over 3,000 articles, which I rank ordered by the number of citations. The first paper authored by Greenfield was 956th in this list: “Non-hydrolytic functions of acetylcholinesterase - The significance of C-terminal peptides”, with 21 citations.

Her reputation appears to be founded on two things: her earlier work, in basic neuroscience in the 1980s and 1990s, which was well-cited, and her high profile as a public figure. Sadly, she seems to now be totally disconnected from mainstream science.

If Greenfield seriously believes in what she is saying, and internet use by children is causing major developmental difficulties, then this is a big deal. So why doesn’t she spend some time at IMFAR, the biggest international conference on autism (and autism spectrum disorder!)  that there is? She could try presenting her ideas and see what feedback she gets. Better still, she could listen to other talks, get updated on current research in this area, and talk with people with autism/ASD and their families.

*For a transcript of the Hard Talk interview see here
 **Thanks for Alan Rew for providing the link to this article

Update: 2nd June 2015: A shortened, version of this blogpost is now posted on the Winnower.


References

Finkenauer, C., Pollman, M.M.H., Begeer, S., & Kerkhof, P. (2012). Examining the link between autistic traits and compulsive Internet use in a non-clinical sample. Journal of Autism and Developmental Disorders, 42, 2252-2256. doi:10.1007/s10803-012-1465-4

Hertz-Picciotto, I., & Delwiche, L. (2009). The rise in autism and the role of age at diagnosis. Epidemiology, 20(1), 84-90. doi:10.1097/EDE.0b013e3181902d15.

McDowell, M. (2004). Autism, early narcissistic injury and self-organization: a role for the image of the mother's eyes? Journal of Analytical Psychology, 49 (4), 495-519 DOI: 10.1111/j.0021-8774.2004.00481.x

Waldman M, Nicholson S, Adilov N, and Williams J. (2008). Autism prevalence and precipitation rates in California, Oregon, and Washington counties. Archives of Pediatrics & Adolescent Medicine, 162,1026-1034.

Waldman, M., Nicholson, S., & Adilov, N. (2006). Does television cause autism? (Working Paper No. 12632). Cambridge, MA: National Bureau of Economic Research.

Sunday, 14 September 2014

International reading comparisons: is England really doing so poorly?

I was surprised to see a piece in the Guardian stating that "England is one of the most unequal countries for children's reading levels, second in the EU only to Romania". This claim was made in an article about a new campaign, Read On, Get On, that was launched this week.

The campaign sounds great. A consortium of organizations and individuals have got together to address the problem of poor reading: the tail in the distribution of reading ability that seems to stubbornly remain, despite efforts to reduce it. Poor readers are particularly likely to come from deprived backgrounds, and their disadvantage will be perpetuated, as they are at high risk of leaving school with few qualifications and dismal employment prospects. I was pleased to see that the campaign has recognized weak language skills in young children as an important predictor of later reading difficulties. The research evidence has been there for years (Kamhi & Catts, 2011), but it has taken ages to percolate into practice, and few teachers have any training in language development.

But! You knew there was a 'but' coming. It concerns the way the campaign has used evidence. They've mostly based what they say on the massive Progress in International Reading Literacy Study (PIRLS), and the impression is they have exaggerated the negative in order to create a sense of urgency.

I took a look at the Read On Get On report. The language is emotive and all about blame: "The UK has a sorry history of educational inequality. For many children, this country provides enormous and rich opportunities. At the top end of our education system we rival the best in the world. But it has long been recognised that we let down too many children who are allowed to fall behind. Many of them are condemned to restricted horizons and limited opportunities." I was particularly interested in the international comparisons, with claims such as "The UK is one of the most unfair countries in the developed world."

So how were such conclusions reached? Read On, Get On commissioned the National Foundation for Educational Research (NFER) to compare levels of reading attainment in the UK with that of other developed countries, with a focus on children approaching the last year of primary schooling.

Given the negative tone of "letting down children", it was interesting to read that "In terms of its overall average performance, NFER’s research found England to be one of the best performing countries." I put that in bold because, somehow, it didn't make it into the Guardian, so is easy to miss. It is in any case dismissed by the NFER report in a sentence: "As a wealthy country with a good education system, that is to be expected."

The evidence of the parlous state of UK education came from consideration of the range of scores from best (95th percentile) to worst (5th percentile) for children in England. Now this is where I think it gets a bit dishonest. Suppose there were a massive improvement in scores for a subset of children, such that the mean and highest scores went up, but with the lowest scoring still doing poorly; presumably, the shrill voices would get even shriller, because the range would extend even further. This seems a tad unfair: yes, it makes sense to stress that the average attainment doesn't capture important things, and that a high average is not a cause for congratulation if it is associated with a long straggly tail of poor achievers. But if we want to focus on poor achievers, let's look at the proportion of children scoring at a low level, and not at some notional 'gap' between best and worst, which is then translated into 'years' to make it sound even more dramatic.

The question is how does England compare with other countries if we just look at the absolute level of the low score corresponding to the 5th percentile. Answer: not brilliant – 16th out of the 24 countries featured in the subset considered by the NFER survey. But, rather surprisingly, we find that the NFER survey excluded New Zealand and Australia, both of whom did worse than England.

So do we notice anything about that? Well, in all three countries, children are learning English, a language widely recognized as creating difficulty for young readers because of the lack of consistent mapping between letters (orthography) and sounds (phonology). In fact, when looking for sources for this blogpost, I happened upon a report from an earlier tranche of PIRLS data, which examined this very topic, by assigning an 'orthographic complexity' score to different languages. The authors found a correlation of .6 between the range of scores (5th to 95th percentile again, this time for 2003 data) and a measure of complexity of the orthography. I applied their orthography rating scale to the 2011 PIRLS data and found that, once again the range of reading scores was significantly related to orthography (r = .72), with the highest ranges for those countries where English was spoken – see Figure below. (NB it would be very interesting to extend this to include additional countries: I was limited to the languages with an orthographic rating from the earlier report).
PIRLS 2011 data: range of reading attainment vs. orthographic complexity
International comparisons have their uses, and in this case they seem to suggest that a complex orthography widens the gap between the best and worst readers. However, they still need to be treated with caution. I haven't had time to delve into PIRLS in any detail, but just looking at how samples of children were selected, it is clear that criteria varied. In particular, there were differences from country to country in terms of whether they excluded children who were non-native speakers of the test language, and whether they included those with special educational needs. Romania, which had the most extreme range of scores between best and worst, excluded nobody. Finand, which tends to do well in these surveys, excluded "students with dyslexia or other severe linguistic disorders, intellectually disabled students, functionally disabled students, and students with limited proficiency in the assessment language." England excluded "students with significant special educational needs". Needless to say, all of these criteria are open to interpretation.

I'm not saying that the tail of the distribution is unimportant. Yes, of course, we need to do our best to ensure that all children are competent readers, as we know that poor literacy is a major handicap to a person's prospects for employment, education and prosperity. But let's stop beating ourselves over the head about this. Research indicates that the reasons for children's literacy problems are complex and will be influenced by the writing system they have to learn (Ziegler & Goswami, 2005) and constitutional factors (Asbury & Plomin, 2013), as well as by the home and school environment: we still have only a poor grasp of how these different factors interact. Until we gain a better understanding, we should of course put in our best efforts to help those children who are struggling. The enthusiasm and good intentions of those behind Read On, Get On are to be welcomed, but their spin on the PIRLS data is unhelpful in implying that only social factors are important.

References
Asbury K, and Plomin R. 2013. G is for genes: The impact of genetics on education and achievement. Chichester: Wiley Blackwell.

Kamhi AG, and Catts HW. 2011. Language and Reading Disabilities (3rd Edition): Allyn & Bacon.

Ziegler JC, & Goswami U (2005). Reading acquisition, developmental dyslexia, and skilled reading across languages: a psycholinguistic grain size theory. Psychological bulletin, 131 (1), 3-29 PMID: 15631549