Introduction
In this session we will examine the roles which corpora may play in the study of language. The importance of copora to language study is aligned to the importance of empirical data. Empirical data enable the linguist to make objective statements, rather than those which are subjective, or based upon the individual's own internalised cognitive perception of language. Empirical data also allows us to study language varieties such as dialects or earlier periods in a language for which it is not possible to carry out a rationalist approach.
It is important to note that although many linguists may use the term "corpus" to refer to any collection of texts, when it is used here it refers to a body of text which is carefully sampled to be maximally representative of the language or language variety. Corpus linguistics, proper, should be seen as a subset of the activity within an empirical approach to linguistics. Although corpus linguistics entails an empirical approach, empirical linguistics does not always entail the use of a corpus.
In the following pages we'll consider the roles which corpora use may play in a number of different fields of study related to language. We will focus on the conceptual issues of why corpus data are important to these areas, and how they can contribute to the advancement of knowledge in each, providing real examples of corpus use. In view of the huge amount of corpus-based linguistic research, the examples are necessarily selective - you can consult further reading for additional examples.
Corpora in Speech Research
A spoken corpus is important because of the following useful features:
- It provides a broad sample of speech, extending over a wide selection of variables such as:
- speaker gender
- speaker age
- speaker class
- genre (e.g. newsreading, poetry, legal proceedings etc)
This allows generalisations to be made about spoken language as the corpus is as wide and as representative as possible. It also allows for variations within a given spoken language to be studied.
- It provides a sample of naturalistic speech rather than speech elicited under aritificial conditions. The findings from the corpus are therefore more likely to reflect language as it is spoken in "real life" since the data is less likely to be subject to production monitoring by the speaker (such as trying to suppress a regional accent).
- Because the (transcribed) corpus has usually been enhanced with prosodic and other annotations it is easier to carry out large scale quantitative analyses than with fresh raw data. Where more than one type of annotation has been used it is possible to study the interrelationships between say, phonetic annotations and syntactic structure.
Prosodic annotation of spoken corpora
Because much phonetic corpus annotation has been at the level of
prosody, this has been the focus of most of the phonetic and phonological research in spoken corpora. This work can be divided roughly into three types:
- How do prosodic elements of speech relate to other linguistic levels?
- How does what is actually perceived and transcribed relate to the actual acoustic reality of speech?
- How does the typology of the text relate to the prosodic patterns in the corpus?
Read more about prosodic annotation in spoken corpora in detail in Corpus Linguistics, Chapter 4, pages 89-90.
Corpora in Lexical Studies
Empirical data has been used in lexicography long before the discipline of corpus linguistics was invented. Samuel Johnson, for example, illustrated his dictionary with examples from literature, and in the 19th Century the Oxford Dictionary used citation slips to study and illustrate word usage. Corpora, however, have changed the way in which linguists can look at language.
A linguist who has access to a corpus, or other (non-representative) collection of machine readable text can call up all the examples of a word or phrase from many millions of words of text in a few seconds. Dictionaries can be produced and revised much more quickly than before, thus providing up-to-date information about language. Also, definitions can be more complete and precise since a larger number of natural examples are examined.
Follow this link for an example of the benefits of corpus linguistics in lexicography
Examples extracted from corpora can be easily organised into more meaningful groups for analysis. For example, by sorting the right-hand context of the word alphabetically so that it is possible to see all instances of a particular collocate together. Furthermore, because corpus data contains a rich amount of textual information - regional variety, author, date, genre, part-of-speech tags etc it is easier to tie down usages of particular words or phrases as being typical of particular regional varieties, genres and so on.
The open-ended (constantly growing) monitor corpus has its greatest role in dictionary building as it enables lexicographers to keep on top of new words entering the language, or existing words changing their meanings, or the balance of their use according to genre etc. However, finite corpora also have an important role in lexical studies - in the area of quantification. It is possible to rapidly produce reliable frequency counts and to subdivide these areas across various dimensions according to the varieties of language in which a word is used.
Finally, the ability to call up word combinations rather than individual words, and the existence of mutual information tools which establish relationships between co-occuring words (see Session 3) mean that we can treat phrases and collocations more systematically than was previously possible. A phraseological unit may consitute a piece of technical terminology or an idiom, and collocations are important clues to specific word senses.
Read about coprus-based work on morphlogy in Corpus Lingustics, Chapter 4, page 92.
Corpora and Grammar
Grammatical (or syntactic) studies have, along with lexical studies, been the most frequent types of research which have used corpora. Copora makes a useful tool for syntactical research because of :
- The potential for the representative quantification of a whole language variety.
- Their role as empirical data for the testing of hypotheses derived from grammatical theory.
Many smaller-scale studies of grammar using corpora have included quantitative data analysis (for example, Schmied's 1993 study of relative clauses). There is now a greater interest in the more systematic study of grammatical frequency - for example, Oostdijk and de Haan (1994a) are aiming to analyse the frequency of the various English clause types.
Since the 1950s the rational-theory based/empiricist-descriptive division in linguistics (see Session One) has often meant that these two approaches have been viewed as separate and in competition with each other. However, there is a group of researchers who have used corpora in order to test essentially rationalist grammatical theory, rather than use it for pure description or the inductive generation of theory.
At Nijmegen University, for instance, primarily rationalist formal grammars are tested on real-life language found in computer corpora (Aarts 1991). The formal grammar is first devised by reference to introspective techniques and to existing accounts of the grammar of the language. The grammar is then loaded into a computer parser and is run over a corpus to test how far it accounts for the data in the corpus. The grammar is then modified to take account of those analyses which it missed or got wrong.
Corpora and Semantics
The main contribution that corpus linguistics has made to semantics is by helping to establish an approach to semantics which is objective, and takes account of indeterminacy and gradience. Mindt (1991) demonstrates how a corpus can be used in order to provide objective criteria for assigning meanings to linguistic terms. Mindt points out that frequently in semantics, meanings of terms are described by reference to the linguist's own intuitions - the rationalist approach that we mentioned in the section on
Corpora and Grammar. Mindt argues that semantic distinctions are associated in texts with characteristic observable contexts - syntactic, morphological and prosodic - and by considering the environments of the linguistic entities an empirical objective indicator for a particular semantic distinction can be arrived at.
Another role of corpora in semantics has been in establishing more firmly the notions of fuzzy categories and gradience. In theoretical linguistics, categories are usually seen as being hard and fast - either an item belongs to a category or it does not. However, psychological work on categorisation suggests that cognitive categories are not usually "hard and fast" but instead have fuzzy boundaries, so it is not so much a question of whether an item belongs to one category or the other, but how often it falls into one category as opposed to the other one. In looking empirically at natural language in corpora it is clear that this "fuzzy" model accounts better for the data: clear-cut boundaries do not exist; instead there are gradients of membership which are connected with frequency of inclusion.
For examples of the above read Corpus Linguistics, Chapter 4, pages 96-97.
Corpora in Pragmatics and Discourse Analysis
The amount of corpus-based reseach in pragmatics and discourse analysis has been relatively small up to now. This is partly because these fields rely on
context (Myers 1991) and the small samples of texts used in corpora tend to mean that they are somewhat removed from their social and textual contexts. Sometimes relevant social information (gender, class, region) is encoded within the corpus but it is still not always possible to infer context from corpus texts.
Much of the work that has been carried out in this area has used the London-Lund corpus which was until recently the only truly conversational corpus. The main contribution of such research has been to the understanding of how conversation works, with respect to lexical items and phrases which have conversational functions. Stenstöm (1984) correlated discourse items such as well, sort of and you know with pauses in speech and showed that such correlations related to whether or not the speaker expects a response from the addressee. Another study by Stenstöm (1987) examined "carry-on signals" such as right, right-o and all right. These signals were classified according to the typology of their various functions e.g.:
- right was used in all functions, but especially as a response, to evaluate a previous response or terminate an exchange.
- All right was used to mark a boundary between two stages in discourse.
- that's right was used as an emphasiser.
- it's alright and that's alright were responses to apologies.
The availability of new conversational corpora, such as the spoken part of the BNC (British National Corpus) should provide a greater incentive both to extend and to replicate such studies, since the amount of conversational data available, and the social/geographical range of people recorded both will have increased. At present, quantitative analyses of corpus-based approaches to issues in pragmatics have been poorly served. Hopefully this is one area which will be exploited by linguists in the near future.
Corpora and Stylistics
Stylistics researchers are usually interested in individual texts or authors rather than the more general varieties of a language and tend not to be large-scale users of corpora. Nevertheless, some stylisticians are interested in investigating broader issues such as genre, and others have found corpora to be important sources of data in their research.
In order to define an author's particular style, we must, in part examine the degree by which the author leans towards different ways of putting things (technical vs non-technical vocabulary, long sentences vs short sentences and so on). This task requires comparisons to be made not only internally within the author's own work, but also with other authors or the norms of the language or variety as a whole. As Leech and Short (1981) point out, stylistics often demands the use of quantification to back up judgements which may appear subjective rather than objective. This is where corpora can play a useful role.
Another type of stylistic variation is the more general variation between genres and channels - for example, one of the most common uses of corpora has been in looking at the differences between spoken and written language. Altenberg (1984) examined the differences in the ordering of cause-result constructions while Tottie (1991) looked at the differences in negation strategies. Other work has looked at variations between genres, using subsamples of corpora as a database. For example, Wilson (1992) used sections from the LOB and Kolhpur corpora, the Augustan Prose Sample and a sample of modern English conversation to examine the usage of since and found that causal since had evolved from being the main causal connective in late seventeenth century writing to being characteristic of formal learned writing in the twentieth century.
Read about stylistic work carried out by Biber and Wikberg in Corpus Linguistics, Chapter 4, pages 102-103.
Corpora in the Teaching of Languages and Linguistics
Resources and practices in the teaching of languages and linguistics tend to reflect the division between the empirical and rationalist approaches. Many textbooks contain only invented examples and their descriptions are based upon intutition or second-hand accounts. Other books, however, are explicitly empirical and use examples and descriptions from corpora or other sources of real life language data.
Corpus examples are important in language learning as they expose students to the kinds of sentences that they will encounter when using the language in real life situations. Students who are taught with traditional syntax textbooks which contain sentences such as Steve puts his money in the bank are often unable to analyse more complex sentences such as The government has welcomed a report by an Australian royal commission on the effects of Britain's atomic bomb testing programme in the Australian desert in the fifties and early sixties (from the Spoken English Corpus).
Apart from being a source of empirical teaching data, corpora can be used to look critically at existing language teaching materials. Kennedy (1987a, 1987b) has looked at ways of expressing quantification and frequency in ESL (English as a second language) textbooks. Holmes (1988) has examined ways of expressing doubt and certainty in ESL textbooks, while Mindt (1992) has looked at future time expressions in German textbooks of English. These studies have similar methodologies - they analyse the relevant constructions or vocabularies, both in the sample text books and in standard English corpora and then they compare their findings between the two sets. Most studies found that there were considerable differences between what textbooks are teaching and how native speakers actually use language as evidenced in the corpora. Some textbook gloss over important aspects of usage, or foreground less frequent stylistic choices at the expense of more common ones. The general conclusion from these studies is that non-empirically based teaching materials can be misleading and that corpus studies should be used to inform the production of material so that the more common choices of usage are given more attention than those which are less common.
Read about language teaching for "special purposes" in Corpus Linguistics, Chapter 4, pages 104-105.
Corpora have also been used in the teaching of linguistics. Kirk (1994) requires his students to base their projects on corpus data which they must analyse in the light of a model such as Brown and Levinson's politeness theory or Grice's co-operative principle. In taking this approach, Kirk is using corpora not only as a way of teaching students about variation in English but also to introduce them to the main features of a corpus-based approach to linguistic analysis.
A further application of corpora in this field is their role in computer-assisted language learning. Recent work at Lancaster University has looked at the role of corpus-based computer software for teaching undergraduates the rudiments of grammatical analysis (McEnery and Wilson 1993). This software - Cytor - reads in an annotated corpus (either part-of-speech tagged or parsed) one sentence at a time, hides the annotation and asks the student to annotate the sentence him- or herself. Students can call up help in the form of the list of tag mnemomics, a frequency lexicon or concordances of examples. McEnery, Baker and Wilson (1995) carried out an experiment over the course of a term to determine how effective Cytor was at teaching part-of-speech learning by comparing two groups of students - one who were taught with Cytor, and another who were taught via traditional lecturer-based methods. In general the computer-taught students performed better than the human-taught students throughout the term.
Corpora and Historical Linguistics
Historical linguistics can be seen as a
species of corpus linguistics, since the texts of a historical period or a "dead" language form a
closed corpus of data which can only be extended by the (re-)discovery of previously unknown manuscripts or books. In some cases it is possible to use (almost) all of the closed corpus of a language for research - something which can be done for ancient Greek for example, using the Theasurus Linguae Graecae corpus which contains most of extant ancient Greek literature. However, in practice historical linguistics has not tended to follow a strict corpus linguistic paradigm, instead taking a selective approach to empirical data, to look for evidence of a particular phemonema and making rough estimates at frequency. No real attempts were made to produce samples that were
representative.
In recent years, however, some historical linguistics have changed their approach, resulting in an upsurge in strictly corpus-based historical linguistics and the building of corpora for this purpose. The most widely known English historical corpus is the Helsinki corpus.
The Helsinki corpus contains approximately 1.6 million words of English dating from the earliest Old English Period (before AD 850) to the end of the Early Modern English period (1710). It is divided into three main periods - Old English, Middle English and Early Modern English - and each period is subdivided into a number of 100-year subperiods (or 70-year subperiods in some cases). The Helsinki corpus is representative in that it covers a range of genres, regional varieties and sociolinguistics variables such as gender, age, education and social class. The Helsinki team have also produced "satellite" corpora of early Scots and early American English.
Other examples of English historical corpora in development are the Zürich Corpus of English Newspapers (ZEN), the Lampeter Corpus of Early Modern English Tracts (a sample of English pamphlets from between 1640 and 1740) and the ARCHER corpus (a corpus of British and American English from 1650-1990).
The work which is carried out on historical corpora is qualitatively similar to that which is carried out on modern language corpora, although it is also possible to carry out work on the evolution of language through time. For example, Peitsara (1993) used four subperiods from the Helsinki corpus and calculated the frequencies of different prepositions introducing agent phrases. Throughout the period she found that the most common prepositions of this type were of and by, which were of almost equal frequency at the beginning of the period, but by the fifteenth century by was three times more common than of, and by 1640 by was eight times as common.
Studies like this have particular importance in the context of Halliday's (1991) conception of language evolution as a motivated change tin the probabilities of the grammar. However, it is important to be aware of the limitations of corpus linguistics, as Rissanen (1989) pointed out. Rissanen identifies three main problems associated with using historical corpora
- The "philologist's dilemma" - the danger that the use of a corpus and a computer may supplant the in-depth knowledge of language history which is to be gained from the study of original texts in their context.
- The "God's truth fallacy" - the danger that a corpus may be used to provide representative conclusions about the entire language period, without understanding its limitations in the terms of which genres it does and does not cover.
- The "mystery of vanishing reliability" - the more variables which are used in sampling and coding the corpus (periods, genres, age, gender etc) the harder it is to represent each one fully and achieve statistical reliability. The most effective way of solving this problem is to build larger corpora of course.
Rissanen's reservations are vaild and important, but should not diminish the value of corpus-based linguistics, rather they should serve as warnings of possible pitfalls which need to be taken on board by scholars, since with appropriate care they are surmountable.
Corpora in Dialectology and Variation Studies
In this section we are concerned with
geographical variation - corpora have long been recognised as a valuable source of comparison between language varieties as well as for the description of those varieties themselves. Certain corpora have tried to follow as far as possible the same sampling procedures as other corpora in order to maximise the degree of comparability. For examples, the LOB corpus contains roughly the same genres and sample sizes as the Brown corpus and is sampled from the same year ( i.e. 1961). The Kolhapur Indian corpus is also broadly parallel to Brown and LOB, although the sampling year is 1978.
One of the earliest pieces of work using the LOB and Brown corpora in tandem was the production of a word frequency comparison of American and British written English. These corpora have also been used as the basis of more complex aspects of language such as the use of the subjunctive (Johansson and Norheim 1988).
One role for corpora in national variation studies has been as a testbed for two theories of language variation. Quirk et al's (1985) "common core" hypothesis, and Braj Kachru's conception of national varieties as forming many unique "Englishes" which differ in important ways from one another. Most work on lexis and grammar comparing the Kolhapur Indian corpus with Brown and LOB has supported the common core hypothosis (Leitner 1991). However, there is still scope for the extension of such work.
Few examples of dialect corpora exist at present - two of which are the Helsinki corpus of English dialects and Kirk's Northern Ireland Transcribed Corpus of Speech (NITCS). Both corpora consist of conversations with a fieldworker - in Kirk's corpus from Northern Ireland, and in the Helsinki corpus from several English regions. Dialectology is an empirical field of linguistics although it has tended to concentrate on experiments and less controlled sampling, rather than use corpora. Such elicitation experiments tend to focus on vocabulary and pronunciation, neglecting other aspects of linguistics such as syntax. Dialect corpora allow these other aspects to be studied, and because the corpora are sampled so as to be representative, quantitative as well as qualitative conclusions can be drawn about the target population as a whole.
Read about comparisons using dialect data in Corpus Linguistics, Chapter 4, page 110.
Corpora and Psycholinguistics
Although psycholinguistics is inherently a laboratory subject, measuring mental processes such as the length of time it takes to position a syntactic boundary in reading or how eye movements change, corpora can still have a part to play in this field. One important use is as a source of data from which materials for laboratory experiments can be developed. Schreuder and Kerkman (1987) point out that frequency is an important consideration in a number of cognitive processes, including word recognition. The psycholinguist should not go blindly into experiments in areas such as this with only a vague notion of frequency to guide the selection and analysis of materials. Sampled corpora can provide psycholinguists with more concrete and reliable information about frequency, including the frequencies of different senses and parts of speech of ambiguous words (if the corpora are annotated).
A more direct example of the role of corpora in psycholinguistics can be seen from Garnham et al's (1981) study which used the London-Lund corpus to examine the occurence of speech errors in natural conversational English. Before the study was carried out nobody knew how frequent speech errors were in everyday language, because such an analysis required adequate amounts of natural conversation, while previous work on speech errors had been based on the gradual ad hoc accumulation of data from many different sources. However, the spoken corpus was able to provide exactly the kind of data that was required. Garnham's study was able to classify and count the frequencies of different error types and hence provide some estimate of the general frequency of these in relation to speakers' overall output.
A third role for corpora lies in the the analysis of language pathologies, where an accurate picture of abnormal data must be constructed before it is possible to hypothesise and test what may be wrong with the human language processing system. Although little work has been done with sampled corpora to date, it is important to stress their potential for these analyses. Studies of the language of linguistically impaired people, and of the language of children who are developing their (normal) linguistic skills, lack the quantified representative descriptions which are available. In the last decade, however, there has been a move towards the empirical analysis of machine-readable data in these areas. For example, the Polytechnic of Wales (POW) corpus is a corpus of children's language; a corpus of impaired and normal language development was been collected at Reading University, while the CHILDES database contains a large amount of impaired and normal child language in several languages.
Corpora and Cultural Studies
It is only recently that the role of a corpus in telling us about culture has really begun to be explored. After the completion of the LOB corpus of British English, one of the earliest pieces of work to be carried out was a comparison of its vocabulary with the vocabulary of the American Brown corpus (Hofland and Johansson 1982). This revealed interesting differences which went beyond the purely linguistic ones such as spelling (
colour/color) or morphology (
got/gotten).
Leech and Fallon (1992) used the results of these earlier studies, along with KWIC concordances of the two corpora to check up on the senses in which words were being used. They then grouped the differences which were statistically significant into fifteen broad categories. The frequencies of concepts in these categories revealed differences between the two countries which were primarily of cultural, not linguistic difference. For example - travel words were more frequent in American English than British English, perhaps suggestive of the larger size of the United States. Words in the domains of crime and the military were also more common in the American data, as was "violent crime" in the crime category, perhaps suggestive of the American "gun culture". In general, the findings seemed to suggest a picture of American culture at the time of the two corpora (1961) that was more macho and dynamic than British culture. Although this work is in its infancy and requires methodological refinement, it seems to be an interesting and promising area of study, which could also integrate more closely work in language learning with that in national cultural studies.
Corpora and Social Psychology
Although linguists are the main users of corpora, they are not the sole users. Researchers in other fields which make use of language data have also recently taken an interest in the exploitation of corpus data - perhaps the most important of these have been social psychologists.
Social psychologists require access to naturalistic data which cannot be reproduced in laboratory conditions (unlike many other psychology-related fields), while at the same time they are under pressure to quantify and test their theories rather than rely on qualitative data. This places them in a curious position.
One area of research in social psychology is that of how and why people attempt to explain things. Explanations (or attributions) are important to the psychologist because they reveal the ways in which people regard their environment. To obtain data for studying explanations researchers have relied on naturally occuring texts such as newspapers, diaries, company reports etc. However, these are written texts, and most everyday human interaction takes place through the medium of speech. To solve this problem Antaki and Naji (1987) used the London-Lund corpus (of spoken language) as a source of data for explanations in everyday conversation. They took 200,000 words of conversation and retrieved all instances of the commonest causal conjunction because (and its variant cos). An analysis of a pilot sample derived a classification scheme for the data, which was then used to classify all the explanations according to what was being explained. For example "actions of speaker or speaker's group", "general states of affairs" and so on. A frequency analysis of the explanation types in the corpus showed that explanations of general states of affairs were the most common type of explanation (33.8%) followed by actions of speaker and speaker's group (28.8%) and actions of others (17.7%). This refuted previous theories that the prototypical type of explanation is the explanation of a person's single action. Work such as Antaki and Naji shows clearly the potential of corpora to test and modify theory in subjects which require naturalistic quantifiable language data, and one may expect other social psychologists to make use of corpora in the future.
Conclusion
In this session we have seen how a number of areas of language study have benefited from exploiting corpus data. To summarise, the main important advantages of corpora are:
- Sampling and quantification. Because a corpus is sampled to maximally represent the population, any findings taken from the corpus can be generalised to the larger population. Hence quantification in corpus linguistics is more meaningful than other forms of linguistic quantification because it can tell us about a variety of language, not just that which is being analysed.
- Ease of access. As all of the data collection has been dealt with by someone else, the researcher does not have to go through the issues of sampling, collection and encoding. The majority of corpora are readily available, either free or at low-cost price. Once the corpora have been obtained, it is usually easy to access the data within it, e.g. by using a concordance program.
- Enriched data. Many corpora have already been enriched with additional linguistic information such as part-of-speech annotation, parsing and prosodic transcription. Hence data retrieval from annotated corpora can be easier and more specific than with unannotated data.
- Naturalistic data. Corpus data is not always completely unmonitored in the sense that the people producing the spoken or written texts are unaware until after the fact that they are being asked to participate in the building of a corpus. But for the most part, the data are largely naturalistic, unmonitored and the product of real social contexts. Thus the corpus provides one of the most reliable sources of naturally occurring data that can be examined.
References
Aarts, J. (1991) "Intuition-based and observation-based grammars" in Aijmer and Altenburg 1991, pp 44-62.
Aarts, J. and Meijs, W. (eds) (1986) Corpus Linguistics II, Amsterdam: Rodopi.
Aijmer, K. and Altenberg, B. (eds) (1991) English Corpus Linguistics: Studies in Honour of Jan Svartvik, London: Longman.
Altenberg, B. (1984) "Causal linking in spoken and written English", Studia Linguisitica 38: 20-69.
Antaki, C. and Naji, S. (1987) "Events explained in conversational "because" statements", British Journal of Social Psychology 26: 119-126.
Atkins, B. T. S. and Levin, B. (1995). "Building on a corpus: a linguistic and lexicographical look at some near-synonyms", International Journal of Lexicography 8:2, 85-114.
Garnham, A., Shillock, R., Brown, G., Mill, A. and Cutler, A. (1981) "Slips of the tongue in the London-Lund corpus of spontaneous conversation", Linguistics 19: 805-17.
Halliday, M. (1991) "Corpus studies and probabilistic grammar", in Aijmer and Altenberg 1991, pp 30-43.
Hofland, K. and Johansson, S. (1982) Word Frequencies in British and American English, Bergen: Norwegian Computing Centre for the Humanities.
Holmes, J. (1988) "Doubt and certainty in ESL textbooks", Applied Linguistics 9: 21-44.
Holmes, J. (1994) "Inferring language change from computer corpora: some methodological problems", ICAME Journal 18: 27-40.
Johansson, S. and Norheim, E. (1988) "The subjunctive in British and American English", ICAME Journal 12: 27-36.
Johansson , S. and Stenström, A-B. (eds) (1991) English Computer Corpora: Selected Papers and Research Guide, Berlin: Mouton de Gruyter.
Kennedy, G. (1987) "Expressing temporal frequency in academic English", TESOL Quarterly 21: 69-86.
Kennedy, G. (1987) "Quantification and the use of English: a case study of one aspect of the learner's task", Applied Linguistics 8: 264-86.
Kirk, J. (1994) "Teaching and language corpora: the Queen's approach", in Wilson and McEnery 1994, pp 29-51.
Kjellmer, G. (1986) ""The lesser man": observations on the role of women in modern English writings", in Arts and Meijs 1986, pp 163-76.
Kytö, M., Rissanen, M. and Wright, S. (eds) (1994) Corpora across the Centuries, Amsterdam, Rodopi.
Leech, G. and Fallon, R. (1992) "Computer corpora - what do they tell us about culture?", ICAME Journal 16: 29-50.
Leech, G. and Short, M. (1981) Style in Fiction, London: Longman.
Leitner, G. (1991) "The Kolhapur corpus of Indian English: intravarietal description and/or intervarietal comparison", in Johansson and Stenström 1991, pp 215-32.
McEnery, A. and Wilson, A. (1993) "The role of corpora in computer-assisted language learning", Computer Assisted Language Learning 6(3): 233-48.
McEnery, A., Baker, P. and Wilson, A. (1995) "A statistical analysis of corpus based computer vs traditional human teaching methods of part of speech analysis.", Computer Assisted Language Learning 8(2/3): 259-74.
Meijs, W. (ed) (1987) Corpus Linguistics and Beyond, Amsterdam: Rodopi.
Mindt, D. (1991) "Syntactic evidence for semantic distinctions in English", in Aijmer and Altenburg 1991, pp 182-96.
Mindt, D. (1992) Zeitbezug im Englischen: eine didaktische Grammatik des englischen Futurs, Tübingen: Gunter Narr.
Myers, G. (1991) "Pragmatics and corpora", talk given at Corpus Linguistics Research Group, Lancaster Univeristy.
O'Connor, J. and Arnold, G. (1961) Intonation of Colloquial English, London: Longman.
Oostdijk, N. and de Haan, P. (1994a) "Clause patterns in modern British English: a corpus-based (quantitative) study", ICAME Journal 18: 41-79.
Oostdijk, N. and de Haan, P. (eds) (1994b) Corpus Based Research into Language, Amsterdam: Rodopi.
Peitsara, K. (1993) "On the development of the by-agent in English", in Rissanen, Kytö and Palander-Collin 1993 pp 217-33.
Quirk, R., Greenbaum, S., Leech, G. and Svartvik, J. (1985) A Comprehensive Grammar of the English Language, London, Longman.
Rissanen, M. (1989) "Three problems connected with the ue of diachronic corpora", ICAME Journal 13: 16-19.
Rissanen, M., Kytö, M. and Palander-Collin, M. (eds) (1993) Early English in the Computer Age, Berlin, Mouton de Gruyter.
Schreuder, R. and Kerkman, H. (1987) "On the use of a lexical database in psycholinguistic research", in Meijs 1987, pp 295-302.
Stenstöm, A-B. (1984) "Discourse items and pauses", Paper presented at Fifth ICAME Conference, Windermere. Abstract in ICAME News 9 (1985): 11.
Stenstöm, A-B. (1987) "Carry-on signals in English Conversatoin", in Meijs 1987, pp 87-119.
Tottie, G. (1991) Negation in English Speech and Writing: A study in Variation, San Diego: Academic Press.
Wilson, A. (1992) The Usage of Since: A Quantitative Comparison of Augustan, Modern British and Modern Indian English, Lancaster Papers in Linguistics 80.
Wilson, A and McEnery, A. (eds) (1994) Corpora in Language Education and Research: A Selection of Papers from Talc94, Unit for Computer Research on the English Language Technical Papers 4 (special issue), Lancaster University.
http://www.lancs.ac.uk/fss/courses/ling/corpus/Corpus4/4FRA1.HTM
Recent Comments