Quantitative Approaches (C. Forstall, L. Galli Milić, D. Nelis)

This contribution examines intertextuality in Latin epics of the Flavian period, and in particular the ways in which thematic correspondences between two passages affect the allusive significance of specific verbal echoes. It has long been understood that Greek and Roman epic poems partake of a shared repertory of stock themes and typical scenes: the catalogue of heroes, the warrior arming for battle, the tempest, etc. (Edwards 1992).

These themes originally evolved under circumstances peculiar to oral-formulaic composition in archaic Greece (Rubin 1995; Minchin 2001), but for the literate successors of the epic tradition in later Greece and Rome, the use of type themes was no longer an aid to memory, but a complex intertextual gesture essential to the genre (De Jong 2014; Nünlist 2009). Existing tools for digital intertextual analysis such as Tesserae, Musisque Deoque’s co-occurrence search, and eTRAP’s forthcoming TRACER framework can automatically generate an exhaustive list of all verbal repetitions between two texts. However, simple text-reuse is not sufficient to make an interesting allusion, and testing of such exhaustive searching suggests that sensitivity to scene-level thematic parallelism would increase both recall and precision for allusion detection (Coffee et al. 2012). Early work on semantic searches (e.g. Scheirer et al. forthcoming) shows that related passages in Latin poems can be identified automatically even without repeated text, but such searches cannot currently be combined easily with word-based results and the proposal of Bamman and Crane (2008) for a “fusion” model with a comprehensive search procedure simultaneously sensitive to similarities in different aspects of the text (e.g., words, syntax, and meaning) remains an elusive goal.

Our objective is to develop automated detection of thematic similarity, compatible with the phrase-based results of existing tools, so that, for example, verbal correspondences can be promoted where they occur within similar thematic contexts. We consider a corpus including the three more or less complete epics of the Flavian period, Valerius Flaccus’ Argonautica, Statius’ Thebaid, Silius Italicus’ Punica, as well as three earlier poems to which our works of interest respond: Lucan’s Civil War, Ovid’s Metamorphoses, and Vergil’s Aeneid. These works are initially subdivided into samples of 50 consecutive verse lines. After lemmatization and removal of very frequent words, i.e. words occurring in more than half of all samples words, each passage is represented as a set of weighted word frequencies (We use term frequency-inverse document frequency weights, which emphasize words that are frequent in the passage at hand but rare overall, cf. Amini 2015).

In the unsupervised approach, k-means clustering identifies groups of passages with similar values. We systematically vary k, the expected number of clusters, performing repeated re-clusterings with different random seeds for each value of k. Areas of the poem showing high levels of agreement between repeated clusterings are identified as more stable thematically, and thus candidates for further study.

In the supervised approach, on the other hand, we begin by selecting scenes of interest using a set of a priori thematic categories (cf. Ciotti et al. 2015). We then attempt to train a classifier that can distinguish these types. Early trials have shown that certain thematic contexts can be separated at a coarse scale (e.g, love versus war, generally) using principal components analysis; however here we will attempt to achieve finer precision using support vector machines, a popular approach in stylometric tasks and one that has previously been applied to intertextuality in Latin in particular (Forstall et al. 2011).