Wheels of Words













Time to make a contribution to the Wheel of Writing, too! Vivienne has asked me to talk about corpora - a tool for writers, language learners and certainly those who want to explore language and its many facets.

The word corpus derives from Latin meaning 'body'. Quite simply, a corpus is a collection of texts of written or spoken language. They give information about how language works. There are many different kinds of corpora and therefore there are many different ways of how corpora are used. Some contain texts of a particular type such as academic articles or newspaper editorials etc. and one can investigate this particular type of language. Others are called learner corpora as they contain texts produced by language learners. They can be used – among other things – to find out how the language of learners differs from the language of native speakers and what problems learners encounter when leanring a language. This information can be used for instance to improve textbooks. 
 
Why should we use corpora? For non-natives, they can be useful to see how native speakers use language in a wide variety of text. For instance, corpora can be very useful to explore aspects such as collocation. Collocations are common word combinations or words that regularly occur together such as vitally important or painfully clear. There are different kinds of combinations: adjective + noun (regular exercise), verb + adverb (whisper softly), verb + noun (make progress) etc.. Native Speakers intuitively use the correct collocations, but for language learners collocations can be difficult to learn because “collocations rules” don't really exist. But collocations will make the language sound more natural, and corpora are one way of finding out about them. 
 
I want to introduce two online portals which I found very useful: The Corpus of Contemporary American English http://corpus.byu.edu/coca/ and the COCA-based website http://www.wordandphrase.info/

Freely available to everyone, the Corpus of Contemporary American English contains more than 425 million words of text (can be transcripts of conversations, novels, magazines, newspapers, academic article and many more). 
 
At the most basic level, you can just search for specific words or phrases and check out a list of all matching strings, or a chart display that shows the frequency of the word in five areas (spoken, fiction, popular magazines, newspapers, and academic journals), or you can search for collocations. As this corpus is pretty complex and for those who have never used a corpus before possibly complicated, I recommend using the wordandphrase.info website first and I'll briefly explain how to use it. 
 
As an example, I typed in the word knowledge in the box and pressed search. What you'll then see is for instance a definiton of the word knowledge ('the psychological result of perception and learning and reasoning') and below words that collocate with knowledge (eg. to acquire knowledge, to gain knowledge ...). You can also find out how often the word knowledge is used in particular areas (spoken, fiction, magazine, newspaper, academic). Out of a total of 54438 hits, the word knowledge is used 35257 times in academic context, and only 3607 times in spoken language. There is also an area where you' ll find synonyms of the word knowledge and probably the most important thing on the page: a chart with sentences containing the word knowledge. This chart is useful as one can at once see the words that regularly surround the words knowledge. 
 
Certainly, this was just a very brief introduction, there is much more to discover! If you need help, both websites provide Guided Tours through the site explaining the most important aspects and features. 
 
For those who wonder what writers can do with corpora, I think that Vivienne will have the answer for you in one of her next posts. In the meantime, good luck with your Corpus investigations.
Judith


By the way, as I have been talking a lot about language learning, in case you haven't come across it already: a great website for learners is the BBC Learning English portal, where you can learn and practise English:

References:
Davies, Mark. (2008-) The Corpus of Contemporary American English: 425 million words, 1990-present. Available online at http://corpus.byu.edu/coca/. 
 
Hunston, Susan: Corpora in Applied Linguistics. Cambridge Univ. Press, 2002. pp. 3-23.




Categories:

Leave a Reply