LEXCONN: A French Lexicon of Discourse Connectives
- aramhayr
- May 30
- 3 min read
Updated: Jun 2
C. Roze, L. Danlos, P. Muller. LEXCONN: A French Lexicon of Discourse Connectives, 2012
Co-pilot (Microsoft) Summary
LEXCONN is a comprehensive lexicon of 328 French discourse connectives, designed to assist researchers, particularly in Natural Language Processing (NLP). It associates each connective with its syntactic category and the discourse relations it conveys, using the Segmented Discourse Representation Theory (SDRT) as a framework.
The methodology involves systematic identification criteria, such as syntactic and semantic features, to define connectives and distinguish them from non-discourse elements. The lexicon also addresses ambiguities in connective usage, noting that 23.7% of connectives have multiple discourse uses and categorizes relations like causality, contrast, and elaboration, among others.
Additionally, the study highlights limitations in SDRT’s predefined relations, introducing six new ones, such as Concession and Digression, to accommodate French connectives. For 6% of connectives, no existing relation could be associated, reflecting the need for further exploration.
LEXCONN is expected to be a valuable resource for enhancing discourse annotation in NLP and for linguistic analysis, providing insights into the roles and frequencies of connectives in French discourse.
Details (by human)
Steps
The first step of our methodology was to gather a corpus of discourse connectives candidates (about 600). To do that, we used various corpora of subordinating conjunctions and prepositions given by Éric Laporte (Université Paris-Est, LIGM, CNRS) and Benoît Sagot (INRIA Paris-Rocquencourt, ALPAGE, Université Paris VII), the list of French discourse markers of the ANNODIS project 4 and the corpus of English discourse connectives built by Knott (1996), that we translated manually.
Applied various syntactic, semantic, and discourse criteria for candidate connectives identification to build the list of French discourse connectives:
not integrated to propositional content (Cleft Criterion),
cannot be referential expressions (Substitutability Criterion)
their meaning is not compositional (Compositionality Criterion).
Contextual Criterion
Forced Relation Criterion
Coherence Criterion
For each connective determine what discourse relation it expresses, observing the contexts where it appears in discourses from the FRANTEXT corpus.
To identify the discourse relation conveyed by a connective, we tested:
Attachment
Substitution
Semantic effects
Ambiguity
Frequency of relations
Summary
Developed a systematic methodology to identify discourse connectives and associate discourse relations to them, resting on various studies concerning connectives and corpus-collected examples.
Connectives remain to be studied in detail (especially connectives whose function is “unknown” so far). A statistical analysis of the resulting lexicon allowed us to quantify several points, such as the importance of the various discourse relations in terms of the number of connectives associated with them, and a count of ambiguous connectives.
Some information must be added in LEXCONN, in particular about ambiguity between discourse and non-discourse usage. It will be possible with further linguistic analysis, but also with automatic analysis on the ANNODIS corpus: the link between position in the host clause and discourse/non-discourse role for adverbials must be studied.
LEXCONN already constitutes a precious resource for NLP. It might help for discourse marker annotation in ANNODIS, in which connectives are not yet marked.
A statistical analysis of the connectives in a corpus can also be useful, for example concerning a connective’s frequency. Such an analysis could help to answer the following question: are ambiguous connectives the most frequent ones?
Notes:
FRANTEXT is a textual base of French literature. It is available at https://www.frantext.fr/
SDRT - Segmented Discourse Representation Theory (Asher & Lascarides, 2003), which inherits from Discourse Representation Theory or DRT (Kamp, 1981) and discourse analysis (Grosz & Sidner, 1986; Mann & Thompson, 1988).
ANNODIS - French texts corpus enriched with a manual annotation of discourse structures. http://redac.univ-tlse2.fr/corpus/annodis/annodis_en.html
Comments