Time
to make a contribution to the Wheel of Writing, too! Vivienne has
asked me to talk about corpora - a tool for writers, language
learners and certainly those who want to explore language and its
many facets.
The word corpus derives from Latin meaning 'body'. Quite simply, a corpus is a collection of texts of written or spoken language. They give information about how language works. There are many different kinds of corpora and therefore there are many different ways of how corpora are used. Some contain texts of a particular type such as academic articles or newspaper editorials etc. and one can investigate this particular type of language. Others are called learner corpora as they contain texts produced by language learners. They can be used – among other things – to find out how the language of learners differs from the language of native speakers and what problems learners encounter when leanring a language. This information can be used for instance to improve textbooks.
Why
should we use corpora? For non-natives, they can be useful to see how
native speakers use language in a wide variety of text. For instance,
corpora can be very useful to explore aspects such as collocation.
Collocations are common word combinations or words that regularly
occur together such as vitally
important
or painfully
clear. There
are different kinds of combinations: adjective + noun (regular
exercise), verb + adverb (whisper softly), verb + noun (make
progress) etc.. Native Speakers intuitively use the correct
collocations, but for language learners collocations can be difficult
to learn because “collocations rules” don't really exist. But
collocations will make the language sound more natural, and corpora
are one way of finding out about them.
I want
to introduce two online portals which I found very useful: The Corpus
of Contemporary American English http://corpus.byu.edu/coca/ and the
COCA-based website http://www.wordandphrase.info/
Freely available to everyone, the Corpus of Contemporary American English contains more than 425 million words of text (can be transcripts of conversations, novels, magazines, newspapers, academic article and many more).
At the
most basic level, you can just search for specific words or phrases
and check out a list of all matching strings, or a chart display that
shows the frequency of the word in five areas (spoken, fiction,
popular magazines, newspapers, and academic journals), or you can
search for collocations. As this corpus is pretty complex and for
those who have never used a corpus before possibly complicated, I
recommend using the wordandphrase.info website first and I'll briefly
explain how to use it.
As an
example, I typed in the word knowledge
in the box and pressed search. What you'll then see is for instance a
definiton of the word knowledge ('the psychological result of
perception and learning and reasoning') and below words that collocate
with knowledge (eg. to acquire knowledge, to gain knowledge ...). You
can also find out how often the word knowledge is used in particular
areas (spoken, fiction, magazine, newspaper, academic). Out of a
total of 54438 hits, the word knowledge is used 35257 times in
academic context, and only 3607 times in spoken language. There is
also an area where you' ll find synonyms of the word knowledge and
probably the most important thing on the page: a chart with sentences
containing the word knowledge. This chart is useful as one can at
once see the words that regularly surround the words knowledge.
Certainly,
this was just a very brief introduction, there is much more to
discover! If you need help, both websites provide Guided Tours
through the site explaining the most important aspects and
features.
For
those who wonder what writers can do with corpora, I think that
Vivienne will have the answer for you in one of her next posts. In the
meantime, good luck with your Corpus investigations.
Judith
By the
way, as I have been talking a lot about language learning, in case
you haven't come across it already: a great website for learners is
the BBC Learning English portal, where you can learn and practise
English:
References:
Davies,
Mark. (2008-) The
Corpus of Contemporary American English: 425 million words,
1990-present.
Available online at http://corpus.byu.edu/coca/.
Hunston,
Susan: Corpora in Applied Linguistics. Cambridge Univ. Press, 2002.
pp. 3-23.
Categories:
Wheel of Writing