Professor John Hawkins, Director of the Research Centre for English and Applied Linguistics (RCEAL), University of Cambridge, talks about RCEAL’s involvement in the English Profile project and their research outcomes to date.


Why did the RCEAL decide to become involved with English Profile?

The Centre was originally set up with funding from Cambridge Assessment, based on shared interests in the issues raised by testing English around the world. When I joined in 2004, the focus of RCEAL was very much on applied linguistics, and I was keen to encourage greater collaboration with other parts of the University including Cambridge Assessment. Mike Milanovic, Nick Saville and I got together to explore some possibilities and in the course of our discussions Mike and Nick raised the basic idea behind this project. English Profile has now emerged and it provides the glue that binds Cambridge ESOL and RCEAL together in a common project that was mapped out in the first year of my being here.

What makes English Profile particularly exciting from my point of view is that it is a remarkable interdisciplinary and collaborative project. I’ve been working on collaborative projects for thirty years, combining linguistics with disciplines like psychology, computer science and medicine, but English Profile is unique in the breadth of disciplines it brings together. It is successfully drawing on both practical work on testing and teaching and theoretical work on language acquisition and the Cambridge Learner Corpus – one of very few electronic corpora that cover learner English from users with such a wide range of backgrounds and at all these different levels.


What is RCEAL’s contribution to English Profile?

RCEAL’s involvement gives substance to the theoretical dimension, especially in language acquisition theory and in the computational analysis of learner English at different stages.

In practical terms, the big challenge – and the big contribution we can make – is to apply up-to-date ways of tagging and parsing the contents of the corpus, which has previously been searchable only at word-level. Errors were also identified and classified in the original corpus, using a quite sophisticated coding system. Other corpora of learner English do exist, but none is on the scale of the Cambridge Learner Corpus, and as far as I know, none has been tagged and parsed in the way we have done.

Our first step was to team up with Professor Ted Briscoe and his colleagues in the University’s Computer Laboratory to add a computational linguistics component, bringing in Paula Buttery and Anna Korhonnen, who were originally PhD students of Ted’s, but who have now joined the staff of RCEAL. Ted has been developing what I and others regard as the world’s most sophisticated automatic parser – called RASP – and this allows us to analyse the data in the corpus in ways that have never been done before. For the first time we can compare learners’ English with that of native speakers, and investigate the ways in which learners’ language becomes more like that of native speakers as they progress through the levels, and the ways in which it remains distinctive.

Most of the existing methods of analysing a corpus are designed to describe the language of native speakers, so all kinds of issues arise when you apply them to non-native speakers, and we have had to produce some entirely new ways of ‘training’ the parser. It took us many months to overcome these technical issues and it’s only now that we’re able to apply the technology to some of the more interesting research questions.

We are also looking at other issues, using the expertise of the RCEAL team. It is transparently obvious that as learners progress, they’re learning more English, and getting better at using the language. What we’re trying to do is to describe this development in terms of ‘criterial features’ – words, meanings, morphemes, grammatical structures and even the way conversation works. You have thousands of properties to learn when you master a second language and what characterises learners’ language is that you make fewer errors and acquire more of these properties as your mastery improves.  Building up an empirical picture of the order in which learners acquire these criterial features, and how the features interact with each other is an incredibly exciting project. The theories of second language acquisition that have been developed hitherto do not enable us to predict these features in advance, since the theorists haven’t had access to data like this, so we’re trying to determine what these patterns are and how they’re impacted by the different linguistic backgrounds of the learners.

For example, I am very interested in questions of language processing, and in grammar and typology, and English Profile enables me to test a number of my own predictions. At the same time, our specialists in second language acquisition – Henriëtte Hendriks, and Teresa Parodi are looking at questions relating to how learners’ first languages impact the order of acquisition and the formation of an interlanguage. Again, the Cambridge Learner Corpus gives us an unprecedented opportunity to compare the interlanguages of learners with different first languages and at different levels.


What do you see as the benefits of English Profile?

Clearly, in any interdisciplinary collaboration, you get together with other researchers and practitioners in order to create something that is bigger than the sum of the parts. In essence, the team in RCEAL is providing theory and empirical findings to support the practitioners who can make use of what we find. We can’t have the practical benefits unless we get the science right – i.e. we need accurate statements about what learners know at each level, what the criterial features are, and what the correlations are between them at each stage.

The benefits on the theoretical side are that we are going to be making contributions to various disciplines in the language sciences. People I talk to in second language acquisition are very excited about the fact that we’re looking at many properties together in a way that hasn’t been explored before and seeing how they cluster and the order in which they’re acquired.

I mentioned that second language acquisition does not enable us to predict how the criterial features interact at the different levels. If I look at an individual construction like, say, relative clauses, I can predict that simpler clause types will be acquired first, or that the use of definite and indefinite articles will be acquired in a particular order. At the moment linguists generally look at individual features like this in a quite interesting way. But English Profile allows us to look at how the features ‘cluster’ at each stage – which ones are learnt at each level. With this database, we can find patterns, formulate and test hypotheses and contribute to an area of linguistics for which we haven’t had adequate theory or adequate data hitherto.

Another benefit for our understanding of language acquisition is that the Cambridge Learner Corpus holds information on the candidates’ first languages, so that we can do detailed empirical work measuring the impact of first languages on the developmental stages. I expect that we will find some general properties of learners’ English at each learning stage, but I expect to see many more that depend on the first language – this is a really big issue in second language acquisition research at the moment. It’s very different from first language acquisition – the thing that is particularly interesting is that you are learning the second language with the first language in the background – basing it on what you already know, so there are transfer effects, not just in pronunciation and word order, but also in subtle stylistic issues at the higher levels.

There are very big theoretical issues here involving development stages and the effects of first languages, and we’re ideally placed both to contribute to them and to benefit from them in this project.

We’re also opening up whole new research avenues through the tools that are being developed within this project - just applying the automatic parser to learners’ language is a great step forwards.  Øistein Andersen in the Computer Laboratory is building on this by developing a programme that can recognise and classify learner errors automatically with a very high level of success.

In terms of practical benefits, we want to make our theoretical findings as useful as possible, of course. One real benefit will be a richer understanding of the six levels of the Common European Framework of Reference. The ground-breaking work of John Trim in understanding the differences remains extremely important because it describes the kinds of grammatical and lexical knowledge that people have at several crucial levels. This is central to the work of Cambridge ESOL, for example. Accurate assessment means being able to assign learners to these levels on a reliable basis. Working with the corpus allows us to enhance this approach empirically, going beyond John Trim’s work, using a larger database and using new technologies to give us a level of quantifiable accuracy which can complement the judgement of expert examiners.

I can also see some ground-breaking benefits for language teaching and for the development of coursebooks. If we have a clearer idea of exactly what learners know at each level, it will be much easier to structure textbooks in a more systematic way and to support teachers better. Detailed empirical study of the problems of particular learners with particular first languages will allow coursebooks and other teaching materials to be targeted to learners from different linguistic backgrounds (e.g. China and Latin America) and to help address transfer effects in a way that has never been possible before.


How would you describe the progress of the work you have done on English Profile so far, and what are the next stages?

Our report (pdf) gives a summary of what we’ve learnt so far, and of the projects we’re currently working on. Some of our initial hypotheses have turned out to be right, while others need refining.

There are clearly going to be a number of research reports that will appear as a result of the English Profile work, as well as academic research papers. The project itself is an expanding one - we’re going to be involving more specialists around the world – and we need a lot more data in order to test some of our claims. So, we need to develop new corpora to take this project forward.

It takes a while to get an ambitious project like this up and running and to collect and analyse the data we need – we’re not rushing into this, we want to get it right, and we want to make sure that what we say is reliable.  We very much appreciate feedback and discussion from other researchers and practitioners in the contributing fields.

Cambridge University Press