Corpus-assisted discourse studies - AbsoluteAstronomy.com

Corpus-assisted discourse studies, or CADS, is related historically and methodologically to the discipline of corpus linguistics. The principal endeavor of corpus-assisted discourse studies is the investigation, and comparison of features of particular discourse types

Discourse types

The term discourse type is preferred to other labels which might be encountered in linguistics, such as text-type or genre since, for some, text-type implies work on written texts, whereas much of CADS has been carried out on spoken discourse, and genre is a term which is accompanied by huge...

, integrating into the analysis the techniques and tools developed within corpus linguistics. These include the compilation of specialised corpora and analyses of word and word-cluster frequency lists, comparative keyword lists and, above all, concordances.

Aims

The aim of the CADS approach is the uncovering, in the discourse type under study, of non-obvious meaning, that is, meaning which might not be readily available to naked-eye perusal. Much of what carries meaning in texts is not open to direct observation: “you cannot understand the world just by looking at it” (Stubbs [after Gellner 1959] 1996: 92). We use language “semi-automatically”, in the sense that speakers and writers make semi-conscious choices within the various complex overlapping systems of which language is composed, including those of transitivity, modality (Michael Halliday

Michael Halliday

Michael Alexander Kirkwood Halliday is a British linguist who developed an internationally influential model of language, the systemic functional linguistic model. His grammatical descriptions go by the name of systemic functional grammar .-Biography:Halliday was born and raised in England...

1994), lexical sets (e.g. freedom, liberty, deliverance), modification, and so on. Authors themselves are, famously, generally unaware of all the meanings their texts convey. By combining the quantitative research

Quantitative research

In the social sciences, quantitative research refers to the systematic empirical investigation of social phenomena via statistical, mathematical or computational techniques. The objective of quantitative research is to develop and employ mathematical models, theories and/or hypotheses pertaining to...

approach, that is, statistical overviews of large amounts of the discourse in question - more precisely, large numbers of tokens of the discourse type under study contained in a corpus - with the more qualitative research

Qualitative research

Qualitative research is a method of inquiry employed in many different academic disciplines, traditionally in the social sciences, but also in market research and further contexts. Qualitative researchers aim to gather an in-depth understanding of human behavior and the reasons that govern such...

approach typical of discourse analysis, that is, the close, detailed examination of particular stretches of discourse it may be possible to better understand the processes at play in the discourse type and to gain access to non-obvious meanings.

History

CADS arises from pioneering work conducted in Europe, in particular by Hardt-Mautner (1995) and Stubbs (1996, 2001). A considerable body of research has also been conducted in Italy either by individual researchers or under the aegis of combined inter-university projects such as Newspool (Partington et al. 2004) and CorDis (Morley and Bayley eds, forthcoming). It has concentrated on political and media language, mainly because a nucleus of linguists in Italian universities work in Political Science faculties and are increasingly interested in the use of corpus techniques to conduct a particular type of sociopolitical discourse analysis, including the unearthing of noteworthy ideological metaphors and motifs in the language of political figures and institutions.

Comparison with standard corpus linguistics

Traditional corpus linguistics has, quite naturally, tended to privilege the quantitative approach. In the drive to produce more authentic dictionaries and grammars of a language, it has been characterised by the compilation of some very large corpora of heterogeneric discourse types in the desire to obtain an overview of the greatest quantity and variety of discourse types possible, in other words, of the chimerical but useful fiction called the “general language” (“general English”, “general Italian”, and so on). This has led to the construction of immensely valuable research tools such as the Bank of English

Bank of English

The Bank of English is the name of the COBUILD corpus, a collection of English texts. These are mainly British, but American and Australian data are also included....

and the British National Corpus

British National Corpus

The British National Corpus is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. It was compiled as a general corpus in the field of corpus linguistics...

.

Corpus linguistics proper has also frequently been characterised by the treatment of the corpus as a “black box”, that is, the analyst is not encouraged to familiarise him/herself with particular texts within the corpus in case the special features these texts may possess should distort his or her conceptions of the corpus as a whole. There is a certain argument which runs that, if we are to construct from scratch a fresh descriptive model of the language which is as closely based on the observation of authentic discourse in action as possible, we need, grammatically speaking, a mental tabula rasa to free ourselves of the baleful prejudice exerted by traditional models and allow the data to speak entirely for itself.

The aim of CADS on the other hand is radically different. Here the aim of the exercise is to acquaint oneself as much as possible with the discourse type(s) in hand. Unusually for corpus linguistics, CADS researchers typically engage with their corpus in a variety of ways. As well as via wordlists and concordancing, intuitions for further research can also arise from reading or watching or listening to parts of the data-set, a process which can help provide a feel for how things are done linguistically in the discourse-type being studied.

CADS is also typically characterised by the compilation of ad hoc specialised corpora, since very frequently there exists no previously available collection of the discourse type in question. Just as typically, other corpora of various descriptions are utilized in the course of a study for purposes of comparison. These may include pre-existing corpora or may themselves need to be compiled by the researcher. In some sense, all work with corpora – just as all work with discourse - is properly comparative. Even when a single corpus is employed, it is used to test the data it contains against another body of data. This may consist of the researcher’s intuitions, or the data found in reference works such as dictionaries and grammars, or it may be statements made by previous authors in the field. Corpus-assisted studies of discourse types are, of course, by definition comparative: it is only possible to both uncover and evaluate the particular features of a discourse type by comparing it with others.

Occasionally it is possible to compare the behaviour of the linguistic items under study in a single discourse type (or monogeneric) corpus with their behaviour in one of the large heterogeneric corpora which are commercially available, such as the British National Corpus or the Bank of English mentioned earlier. On other occasions, however, it becomes appropriate to adopt more complex procedures and to edit, tailor or compile a corpus for special purposes.

'A basic, standard methodology in CADS may resemble the following:'

Step 1: Decide upon the research question;

Step 2: Choose, compile or edit an appropriate corpus;

Step 3: Choose, compile or edit an appropriate reference corpus / corpora;

Step 4: Make frequency lists and run a keywords comparison of the corpora;

Step 5: Determine the existence of sets of key items;

Step 6: Concordance interesting key items (with differing quantities of co-text);

Step 7: (Possibly) refine the research question and return to Step 2.

This basic procedure can of course vary according to individual research circumstances and requirements.

Research questions and techniques of analysis

CADS research projects often focus on research questions of the following types:

Given that P is a discourse participant (or possibly an institution) and G is a goal, often a political goal:

(i) How does P achieve G with language?

(ii) What does this tell us about P?

(iii) Comparative studies: how do P1 and P2 differ in their use of language? Does this tell us anything about their different principles and objectives?

A second general type of CADS research question, which might be asked of interactive discourse data, is the following.

Given that P(x) is a particular participant or set of participants, DT is the discourse type, and R is an observed relationship between or among participants:

How do {P(a), P(b)…P(n)} achieve / maintain R in DT [using language]?

Another common type of research question open to investigation using CADS techniques is the following:

Given that A is an author, Ph(x) is a phenomenon or practice or behaviour, and DT(x) is a particular discourse type.

A has said P(x) is the case in DT(a)

Is Ph(x) the case in DT(b)

This is a classic “hypothesis-testing” research question: we test the hypothesis that whatever practice has been observed by a previous author in some discourse type will be observable in another. It is a process we might call para-replication, that is, the replication of an experiment with either a fresh set of texts of the same discourse type or of a related discourse type, “in order to see whether [findings] were an artefact of one single data set” (Stubbs 2001: 124).

A final example of research question which may usefully be investigated using CADS techniques is of the following sort.

Given that P(x) is a participant or category thereof, and LF(x) is a particular language feature:

Do {P(a)} and {P(b)} use LF(x) in the same way?

Such research aims to ascertain whether different participants use a particular linguistic feature in the same or different ways. The research may proceed to attempt to explain why this is the case.

Some research to date

Influential CADS case studies include the following:

How ideas about groups of people and race are constructed and disseminated through repeated language use (Krishnamurthy 1996).

A study of German loan words in English and their connection to cultural stereotyping (Stubbs 1998).

Analyses of the language of euro-sceptic debate in the UK (Teubert 2000).

The typical language strategies, metaphors and motifs used journalists and spokespersons in US press conferences, and how these reflect their respective world-views (Partington 2003, 2007).

How prediction is effected in economic texts, that is, how economic forecasts are presented and hedged (Walsh 2004).

How government witnesses in the Hutton Inquiry constructed their professional identity (Duguid 2007, 2008).

The CorDis project. How the conflict in Iraq was discussed and reported in the Senate and Parliament, in US press briefings and the Hutton Inquiry, in US/UK newspapers and TV news ( Marchi and Taylor forthcoming; Morley and Bayley forthcoming; Haarman and Lombardo forthcoming).

The Intune project 2004-9 (media linguists workgroup). An EU-funded venture to investigate how the press in France, Italy, Poland and the UK represent issues relating to European citizenship and identityhttp://www.Intune.it.

Related projects

Corpus-assisted critical discourse analysis

A team of linguists based at Lancaster University http://ucrel.lancs.ac.uk/projects/rasim/ (UK) is researching the possibility of combining the techniques typically used in corpus research and those used in critical discourse analysis

Critical discourse analysis

Critical discourse analysis is an interdisciplinary approach to the study of discourse that views language as a form of social practice and focuses on the ways social and political domination are visible in text and talk....

(CDA). CDA generally adopts a leftist political stance,
focussing on the ways that social and political domination is reproduced by text and talk (Baker et al. 2008).

Modern diachronic corpus-assisted discourse studies (MD-CADS)

Modern diachronic corpus-assisted discourse studies contrasts the language contained in comparable corpora from different but recent points in time in order to track changes in modern language usage but also social, cultural and political changes over modern times, as reflected - and shared among people - in language. The SiBol (Siena-Bologna Universities) project analyses the differences between two corpora of UK quality newspaper texts, the first dating from 1993 (c 100 million words), the second from 2005 (140 million words).

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.