Tag Archives: Corpus linguistics

A new paper by Hemmatian, Sloman, Cohen Priva, and Sloman

Babak Hemmatian and colleagues just published his paper Think of the consequences: A decade of discourse about same-sex marriage at Behavior Research Methods. The paper studies the change of discourse regarding same sex marriage changed over the course of 10 years using topic models and a large corpus of Reddit posts.

Approaching issues through the lens of nonnegotiable values increases the perceived intractability of debate (Baron & Spranca in Organizational Behavior and Human Decision Processes, 70, 1–16, 1997), while focusing on the concrete consequences of policies instead results in the moderation of extreme opinions (Fernbach, Rogers, Fox, & Sloman in Psychological Science, 24, 939–946, 2013) and a greater likelihood of conflict resolution (Baron & Leshner in Journal of Experimental Psychology: Applied, 6, 183–194, 2000). Using comments on the popular social media platform Reddit from January 2006 until September 2017, we showed how changes in the framing of same-sex marriage in public discourse relate to changes in public opinion. We used a topic model to show that the contributions of certain protected-values-based topics to the debate (religious arguments and freedom of opinion) increased prior to the emergence of a public consensus in support of same-sex marriage (Gallup, 2017), and declined afterward. In contrast, the discussion of certain consequentialist topics (the impact of politicians’ stance and same-sex marriage as a matter of policy) showed the opposite pattern. Our results reinforce the meaningfulness of protected values and consequentialism as relevant dimensions for describing public discourse and highlight the usefulness of unsupervised machine-learning methods in tackling questions about social attitude change.

LingLang Lunch (9/26/2018): Mirjam Fried (Charles University in Prague)

Mirjam Fried is Associate Professor of Department of Linguistics at Charles University in Prague (CUNI). She is interested in the cognitive and functional aspects of language description and analysis. She investigates various aspects of morphology and morphosyntax from both synchronic and diachronic perspectives. For more information, her website is here.

When main clauses go AWOL: a constructional account of polarity shifts in insubordination

The language of spontaneous dialog is an indispensable resource for elucidating the complex patterns of language production and reception (Levinson & Holler 2014). Moreover, the natural state of spoken language is its permanent variability, which makes a systematic description of its properties a real challenge, but at the same time offers an informative window into the ways new patterns and new categories may develop in interactional practice. The process of forming a new linguistic device is also the main concern of this talk, addressing the general question of how language users may recruit existing grammatical resources in order to create new linguistic patterns with new functions. I pursue the hypothesis that grammatical change originates in the interplay between a specific item and a concrete environment in which it is used and that the interaction helps shape the kind of change that eventually results.

Using material from the spoken corpora of the Czech National Corpus, I will illustrate these issues through a particular case so far largely untouched in relevant research: the usage of the word jestli ‘if/whether’ not in its etymologically motivated function as a syntactic complementizer (as in Nikdo neví, jestli to Martin udělá ‘Nobody knows if Martin will do it’) but in one of its non-propositional functions of expressing a subjective guess about something being likely (1) or unlikely (2); note also that the lexeme (in bold) tends to be phonetically reduced, sometimes quite drastically (1):

(1) esi vona nečekala na telefon

‘[I don’t’ know for sure but I think] she may’ve been waiting for a phone call.’

(lit. ‘if/whether she didn’t wait for a phone call’)

(2) jesi vůbec tam maj ňáký dřevo na topení

‘[I don’t’ know for sure but I don’t think] they many not have any wood to burn’.

(lit. ‘if/whether they have any wood at all for burning’)

These patterns exemplify one type of a cross-linguistically wide-spread and well-attested phenomenon known as insubordination (Evans 2007, 2009; Evans & Watanabi 2016), whereby an erstwhile subordinate clause introduced by a dedicated subordinating complementizer retains its form but loses its main clause and develops new conventional meanings. In this talk, I will concentrate on the cluster of questions concerning the gradual loss of the main clause (full clause > lexically fixed reduced clause > discourse particle > 0), specifically zeroing in on the resulting polarity patterns in the free-standing jestli-clauses; the use of negation is observably different from the regular syntactic counterparts. I suggest that the origins and development of insubordination must be analyzed primarily as an issue of discourse organization rather than from a purely syntactic perspective (such as loss of a paratactic structure or simple ellipsis of main clause), but with consequences for their syntactic behavior as well.

The analysis speaks to both typological and theoretical concerns. (i) It confirms that this subset of jestli-insubordination in conversational Czech can be related to the typology proposed by Evans in two of the three general categories: expressing a broad spectrum of modal meanings (here, subjective epistemic assessment, as in 1-2) and signaling presupposed material (negation and disagreement in 2). And (ii) from a broader theoretical perspective, insubordination makes a case for a particular approach to grammatical description, namely, one that takes into account both internal features of linguistic units and a ‘holistic’ perspective on specific conventionalized constellations of linguistic units. This multi-dimensional view is the basic conceptual tenet of constructional approaches and allows naturally for integrating both compositional and non-compositional properties of linguistic patterns.

LingLang Lunch (10/22/2014): Masako Fidler (Brown University)

Mining reader receptions of text with keyword analysis

“Keyness” is a property attributed to words extracted from statistical tests (e.g., chi-square and log-likelihood tests), which contrast word frequencies in the target text (Ttxt) against the background of the word frequencies in a larger corpus (the reference corpus, RefC) (Scott 1996, Baker and Ellece 2011). Words with keyness (keywords, KWs) are said to point to what the text is about (“aboutness”), and/or the structural characteristics of the text (Bondi 2010), although what exactly constitutes “aboutness” is still under debate. It is also noted in existing literature that KWs differ when different reference corpora are used as the background.

This presentation will show one application of such keyword analysis (KWA). It attempts to demonstrate that KWA can be sensitive to political shifts in a society/region to varying degrees when RefCs from two distinct historical periods are used to extract data. KWA, then, can point not only to genre-specific properties of a text, but also to what readers, whose usage patterns are reflected in the reference corpus, consider prominent or surprising in a text. KWA can help us motivate different reader receptions of a text.

LingLang Lunch (3/18/2015): Václav Cvrček (Institute of the Czech National Corpus)

Descriptive vs prescriptive approach. The case of Czech grammar

The sociolinguistic situation of Czech is usually described as being close to diglossia: there are two competing varieties, one is expected in formal situations, while the other a real vernacular, is a mother tongue of the vast majority of speakers. This situation has its historical reasons with the most important of them being the prescriptive approach to language regulation, which was applied to the description of Czech since the beginning of the 19th century and is still prevailing (cf. Starý 1993). In my talk I will focus on the problem of descriptive and prescriptive approach to language regulation. I will document these contrasting points of view on the example of Grammar of Contemporary Czech (Cvrček et al., 2010) which is the first corpus-based description of Czech and which was designed to form a counterpart to prescriptive reference books.