Discourse Analysis
- aramhayr
- Jul 20
- 4 min read
Updated: Aug 15
Overview
AI (Chrome)
Discourse analysis (DA) is a qualitative research method used across various disciplines to study how language is used to construct meaning, social interactions, and social realities. It examines spoken and written texts, as well as other forms of communication, to understand how they function within specific contexts and shape our understanding of the world.
Human
The Text linguistics is an application of DA. The objects of DA - monolog, dialog, conversation, writing, communicative event - are variously defined in terms of coherent sequences of units of speech (Hayrapetyan, 2025). "Contrary to much of traditional linguistics, discourse analysts not only study language use 'beyond the sentence boundary' but also prefer to analyze 'naturally occurring' language use, not invented examples" (see Discourse Analysis for more info).
Discourse Analysis and Annotation
Theoretical models of discourse emerged in the 1980s (e.g., Mann & Thompson 1986; Grosz & Sidner 1986; Hobbs 1985) to explicitly represent what makes discourse coherent. Coherence is the defining property that distinguishes discourse from a mere sequence of sentences. It is a cognitive construct, guiding how language users interpret and mentally organize discourse.
The key formal tool for representing coherence is the Discourse Relation (DR), also known as a Coherence Relation. DRs express the informational surplus (AH: investigate theoretical feasibility and practical benefits of calcualting that surplus- relative or differential information) between discourse segments—meaning that cannot be derived from the segments in isolation. These conceptual links reflect the cognitive representation and inferential operations we perform while interpreting language (Knott & Sanders 1998). As such, DRs are widely considered the building blocks of discourse structure (Evers-Vermeul & Sanders 2015, AH: compare with the Hayrapetyan (2025), the Parts and Units of Speech and Language section). DRs can be implicit, as in (1a) and (2a), or explicit, often marked by discourse connectives, as in (1b) and (2b), or other lexical/grammatical devices:
1a. Max opened the door. The room was pitch dark. (Background relation)
1b. When Max opened the door, the room was pitch dark.
2a. Max switched off the light. The room was pitch dark. (Result relation)
2b. Max switched off the light. As a result, the room was pitch dark.
Terms
Communication- the transmission of information.
Conversation- interactive communication between two or more people.
Discourse - a generalization of the notion of a conversation to any form of communication. It is a "sequence of written or oral utterances, arranged into a coherent whole" (Zufferey & Moeschler 2012: 143)
Information - amount of news encoded in a message - verbal or other. It is a relative quantity: it is inversely proportional to the receiver's anticipation (probability) of getting a known (recognizable) code in the particular position in the message. If I tell that «եւ-ը վերջածանց է» to someone, who does not understand Armenian, then the amount of information transmitted by the message will be 0, because no code is known. For a lay person, who knows Armenian, the message has much more information, then for a linguist. Ellipses in the discourse occur when the speaker is sure that the probability of the omitted phrase recovery is 1, hence, carries no information for the listener.
Bulletin board
Call for Contributions: Physical Commonsense Reasoning Datasets for 40+ Languages - The Multilingual Representation Learning (MRL) workshop is looking for volunteers to contribute to a multilingual physical commonsense reasoning dataset, to be used to evaluate large language model capabilities across languages. Contributors would submit a manually-written dataset in their native language(s), or with the help of a native speaker of a non-English language (even just 100 examples is ok). We will provide guidance on how to do this! Interest form. More details - MRL 2025 Shared Task on Multilingual Physical Reasoning Datasets
References
#7 in Լեզվաբանություն, ևն
A. Hayrapetyan (2025). Conjunctions in Eastern Armenian,
H. Jivanyan (2025). Coherence and Discourse relations: a discourse-analytical perspective, AUA, YALP 2025.
A. Lưu, S. A. Malamud (2020). Annotating Coherence Relations for Studying Topic Transitions in Social Talk. The 14th Linguistic Annotation Workshop, pages 174–179. Barcelona, Spain (Online), December 12, 2020.
Intro to RST - Rhetorical Structure Theory (Mann and Thompson 1988; and its annotation guidelines: Carlson and Marcu 2001)
L. Carlson, D. Marcu, (2001). Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory
L. Carlson, D. Marcu, M. E. Okurovsky Discourse Tagging Reference Manual
SDRT: Segmented Discourse Representation Theory
A. Lascarides, N. Asher. Segmented Discourse Representation Theory: Dynamic Semantics with Discourse Structure
PDTB: Penn Discourse Treebank (Miltsakaki et al., 2004, annotation guidelines (Prasad et al., 2007))
E. Miltsakaki, A. Joshi, R. Prasad, B. Webber. (2004), Annotating Discourse Connectives and Their Arguments
E. Miltsakaki, A. Joshi, R. Prasad, B. Webber. (2004), The Penn Discourse Treebank
B. Webber, R. Prasad, A. Lee, A. Joshi (2017), The Penn Discourse Treebank 3.0 Annotation Manual
CCR: Cognitive Approach to Coherence Relations (Sanders, Spooren & Noordman 1992)
Sanders, T. J., Spooren, W. P., & Noordman, L. G. (1992). Toward a taxonomy of coherence relations. Discourse Processes, 15(1), 1–35.
Comments