Researchers involved in the English Profile Programme are developing an innovative and unique methodology for describing the English language using corpus research techinques. Previous language profiles have been produced by language specialists largely using their insight as expert users and teachers of the language. However English Profile'sELT students methodology is empirical, based on data provided by real learners of English, which means that it provides concrete evidence of what learners throughout the world can do at each level of the CEF. It is also non-‘linguacentric’, i.e. not solely concerned with the English spoken by native speakers in the UK, since this corpus data has been provided by learners of English all over the world.

English Profile's Reference Level Descriptions (RLDs) will serve as a framework to classify, systematise and compare learner production of the English language.  Thanks to its basis in empirical research, it can be referenced with far greater certainty than anything preceding it.


The Cambridge English Profile Corpus (CEPC)

The Cambridge English Profile Corpus (CEPC) is a corpus of learner English produced by students all over the world.  It is being built by Cambridge University Press and Cambridge English Language Assessment, in collaboration with a network of participating educational establishments across the world, including schools, universities, research centres, government bodies (such as ministries of education) and individual education professionals.

English Profile aims to collect 10 miillion words of data, covering both spoken (20%) and written (80%) language.  Both General English and English for Specific Purposes are included. The corpus covers levels A1-C2, and attempts to maintain a balance across a number of variables, including:

  • educational contexts (e.g. primary or secondary, monolingual or bilingual)
  • linguistic function (informative, suasive, attitudinal, socialising, and structuring discourse. These categories are taken from the T-series and can also be found in appendix B of Antony Green's book Language Functions Revisited.)
  • type of interaction e.g. casual conversation, formal presentation, oral exam, classroom discourse, role play etc (spoken data only)
  • first language of learners
  • age range of learners, and other demographic information
  • CEF level


The CEPC and the Cambridge Learner Corpus

The CEPC is intended to complement the existing Cambridge Learner Corpus (CLC), which was also developed as a collaborative project between Cambridge University Press and Cambridge English Language Assessment. The CLC forms part of the Cambridge English Corpus (formerly known as the Cambridge International Corpus) and is currently being used by English Profile partners and other approved researchers involved in English Profile related research. It contains over 50 million words of English from learners of English all over the world, a large proportion of which has been coded for learner error. This is a fantastic resource, already the largest learner corpus in the world, and expanding all the time.

The CLC is composed entirely of exam scripts and related question papers; the CEPC will cover a wider range of learner output, including essays, coursework, and spoken data, collected in real or virtual classrooms or completed as homework. Researchers using the CEPC will therefore be able to track learners’ acquisition of a language feature across CEF levels, revealing their capabilities beyond the specifics of what they have been taught and tested for in any given year. Like the CLC, the CEPC will be aligned to the CEF levels, allowing the study of the acquisition of English, and the development of teaching and assessment material, across proficiency levels.
The CEPC will include responses to tasks designed by English Profile researchers (at Cambridge and Bedfordshire Universities, and elsewhere) for specific research purposes, to elicit features of learner production which are rarely captured in corpora. It will be freely available online to all those who have contributed data to it, as well as to approved English Profile researchers. Part of the CEPC data will also be made publicly available.

click to play Cambridge corpus video

Cambridge University Press