Monday, 4 May 2015

Great Expectations: Our early assessments of schoolchildren are misleading and damaging



The Early Years Foundation Stage Profile was developed by the government's Standards and Testing Agency "to support practitioners in making accurate judgements about each child's attainment". More specifically:
The EYFS Profile summarises and describes children’s attainment at the end of the EYFS. It is based on ongoing observation and assessment in the three prime and four specific areas of learning, and the three characteristics of effective learning,
• Prime areas: communication and language; physical development; personal, social and emotional development
• Specific areas:  literacy; mathematics; understanding the world; expressive arts; and design of effective learning
• Characteristics: playing and exploring;  active learning;  creating and thinking critically
for each ELG, practitioners must judge whether a child is meeting the level of development expected at the end of the Reception year (expected), exceeding this level (exceeding), or not yet reaching this level (emerging).
The manual gives concrete examples of the kinds of behaviour that meet the expected level for a given Early Learning Goal. For instance:
Understanding: Children follow instructions involving several ideas or actions. They answer ‘how’ and ‘why’ questions about their experiences and in response to stories or events.
Speaking: Children express themselves effectively, showing awareness of listeners’ needs. They use past, present and future forms accurately when talking about events that have happened or are to happen in the future. They develop their own narratives and explanations by connecting ideas or events.
Strikingly absent from these descriptions is any allowance for the child's age. The timing of the assessment is specified to occur when children are aged from 4 yrs 10 months to 5 yr 9 months.
Children's language skills (and indeed other skills) develop rapidly in the preschool and early school years.  I first became aware of this many years ago when I was developing a children's comprehension assessment (TROG). The goal was to establish the typical range of performance at different ages and subsequently use TROG to identify cases of poor comprehension in clinical settings. The assessment involved showing children sets of four pictures and asking them to point to the one that matched a spoken phrase or sentence.  I knew very little about developmental psychology at the time, so I just decided to try the materials with children of different ages to see how they reacted. It soon became apparent that there were substantial age-related changes, and I realised that if I would need to use four age-bands for 4-year-olds and two age-bands for 5-year-olds. Some illustrative data are shown in Figure 1.


Figure 1: Percentage children getting 4/4 items correct on blocks testing specific constructions. 
From the original Test for Reception of Grammar (1983).

Findings like this are not specific to this test. I've developed several language assessments over the years and I've used those developed by others: they all show rapid change from 4 to 6 years.
Concerned by this, I wrote for information to the government's Children and Early Years Data Unit, who referred me to this report.  This gives percentages of children reaching a Good Level of Development, defined as achieving "at least the expected level in the early learning goals in the prime areas of learning (personal, social and emotional development; physical development; and communication and language) and in the specific areas of mathematics and literacy." A Good Level of Development was obtained by 69% of autumn-born children, 59% of spring-born children and 47% of summer-born children, confirming that the standards used to evaluate children are sensitive to age.
This is seriously problematic for at least reasons. First, it means we are using flawed assessments that will over-identify problems in younger children. It is already established that in the USA attentional deficits are over-diagnosed in summer-born children (Elder, 2010) – a problem that has long-term consequences when children are subsequently prescribed medication for what may actually normal behaviour in an immature child. Making children feel that they are falling short of an expected standard before they are 5 years old cannot be good for their development. In this regard it is noteworthy that there is evidence that being summer-born continues to be associated with educational disadvantage in English children through the later school years (Crawford et al, 2013).
A second problem is that use of inappropriate criteria for 'expected' levels of development will give a false impression of the numbers of children with developmental difficulties. Consider this article describing an 'early learning crisis' with '20 percent of children unable to communicate properly at age 5'. I have a particular interest in children who have language difficulties, but nobody is helped by over-identifying problems in children who are just the youngest in their class. I've seen enough 4 and 5-year-olds to know that the 'early learning goals' for understanding and speaking are not realistic 'expectations' for 4-year-olds and for those who have only just turned 5 years. Indeed, the fact that one third of the oldest children are not regarded as having a good level of development suggests to me that the expectations are inappropriately high even for the oldest 5-year-olds.
My colleague Courtenay Norbury, Professor in the Psychology Dept at Royal Holloway, will shortly be publishing data from a large survey of language development in reception class children in Surrey*. She tells me that month of birth is once again emerging as an important factor.
I'm not someone who is opposed to assessment in principle, but if you are going to do it, it's important to do it in an informed manner. Surely it is time for the policy-makers in this area to recognise that their current practices of early assessment are misleading, and have the potential to cause damage when children are evaluated against standards that are overly stringent and do not take age into account.


*Update 5th June 2015: This is now published as an open access 'early view' paper in Journal of Child Psychology and Psychiatry: http://onlinelibrary.wiley.com/doi/10.1111/jcpp.12431/abstract
 

Monday, 20 April 2015

How long does a scientific paper need to be?





©CartoonStock.com

There was an interesting exchange last week on PubMedCommons between Maurice Smith, senior author of a paper on motor learning, and Bjorn Brembs, a neurobiologist at the University of Regensburg. The main thrust of Brembs' critique was that the paper, which was presented as surprising, novel and original, failed adequately to cite the prior literature. I was impressed that Smith engaged seriously with the criticism, writing a reasoned defence of the choice of material in the literature review, and noting that claims of over-hyped statements were based on selective citation.  What really caught my attention was the following statement in his rebuttal: "We can reassure the reader that it was very painful to cut down the discussion, introduction, and citations to conform to Nature Neuroscience’s strict and rather arbitrary limits. We would personally be in favor of expanding these limits, or doing away with them entirely, but this is not our choice to make."
As it happens, this comment really struck home with me, as I had been internally grumbling about this very issue after a weekend of serious reading of background papers for a grant proposal I am preparing. I repeatedly found evidence that length limits were having a detrimental effect on scientific reporting. I think there are three problems here.
1. The first is exemplified by the debate around the motor learning paper. I don't know this area well enough to evaluate whether omissions in the literature review were serious, but I am all too familiar with papers in my own area where a brief introduction skates over the surface of past work. One feels that length limits play a big part in this but there is also another dimension: To some editors and reviewers, a paper that starts documenting how the research builds on prior work is at risk of being seen as merely 'incremental', rather than 'groundbreaking'. I was once explicitly told by an editor that too high a proportion of my references were more than five years old. This obsession with novelty is in danger of encouraging scientists to devalue serious scholarship as they zoom off to find the latest hot topic.  
2. In many journals, key details of methods are relegated to a supplement, or worse still, omitted altogether. I know that many people rejoiced when the Journal of Neuroscience declared it would no longer publish supplementary material: I thought it was a terrible decision. In most of the papers I read, the methodological detail is key to evaluating the science, and if we only get the cover story of the research, we can be seriously misled. Yes, it can be tedious to wade through supplementary material, but if it is not available, how do we know the work is sound?
3. The final issue concerns readability. One justification for strict length limits is that it is supposed to benefit readers if the authors write succinctly, without rambling on for pages and pages.  And we know that the longer the paper, the fewer people will even begin to read it, let alone get to the end. So, in principle, length limits should help. But in practice they often achieve the opposite effect, especially if we have papers reporting several experiments and using complex methods. For instance, I recently read a paper that reported, all within the space of a single Results section about 2000 words long, (a) a genetic association analysis; (b) replications of the association analysis on five independent samples (c) a study of methylation patterns; (d) a gene expression study in mice; and (e) a gene expression study in human brains. The authors had done their best to squeeze in all essential detail, though some was relegated to supplemental material, but the net result was that I came away feeling as if I had been hit around the head by a baseball bat. My sense was that the appropriate format for reporting such a study would have been a monograph, where each component of the study could be given a chapter, but of course, that would not have the kudos of a publication in a high impact journal, and arguably fewer people would read it.
Now that journals are becoming online-only, a major reason for imposing length limits – cost of physical production and distribution of a paper journal – is far less relevant. Yes, we should encourage authors to be succinct, but not so succinct that scientific communication is compromised.