Current Solo Projects
Word embedding models, which represent semantic relationships between words as “vectors,” have exploded recently in the Machine Learning and Natural Language Processing communities, in part because these new models can represent and predict semantic relationships as complex as analogy. “Man is to woman as king is to queen” is represented as a mathematical formula obtaining between word vectors: V(Queen) ≈ V(King) – V(Man) +V(Woman). Or, “riches are to virtue as learning is to genius”—as Edward Young argued in his 1759 treatise against classical imitation—is represented as V(Genius) ≈ V(Learning) – V(Riches) + V(Virtue). This project explores the possibilities that such vector-based representations of semantic relationships have as analytical tools for a conceptually-oriented distant-reading of eighteenth-century literature.
This project is an attempt at arguing that virtual literature—texts made by computers—can productively reframe our understanding of actual literature. This project actually began tongue-in-cheek, producing the playful exercise of my first blog post on a “found” virtual poem. But the more I thought about virtual texts, the more seriously I found myself considering what they might have to tell us. In particular, I think they raise some interesting questions for the theory and methodology of DH.
Current Collaborative Projects
Collaborator: Marissa Gemma
This project analyzes the role of lexical bundles—extremely common collocations of three or more words, such as “the ways in which,” “there was a,” or “in the world”—in a corpus of several thousand novels published in Great Britain and America between 1700 and 1900. The two key goals of the project are: 1) to provide a taxonomy of the discourse functions of lexical bundles in eighteenth- and nineteenth-century fiction; and 2) to historicize that usage by tracking changes in our corpus over the course of the nineteenth century.
Collaborators: Mark Algee-Hewitt, J.D. Porter, Jonathan Sensenbaugh, Justin Tackett
What is the history of meter in English poetry? Computational models of metrical scansion allows us to explore this question with a greater nuance at a larger scale than previously. Building on linguistic models of metrical discourse drawn from the field of generative metrics, we develop a computational method for identifying poetic meter, enabling us to move beyond generalizations and into the vast archive of poetic history. By running our metrical parser on a sample corpus of hand-scanned poems from a large corpus of poetry, we were able to refine our algorithm to the point that we felt confident in its accuracy and predictive capability. We then set out to run our parser on a much larger collection of poems, upon which the interpretive results of our study are based.
“Strange and Surprizing Adventures”: A Digital Disarticulation of Eighteenth-Century Fictional Genres
Collaborators: Mark Algee-Hewitt, Laura Eidem, Anita Law, Tanya Llewellyn
This project investigates the relationship between title and text in eighteenth-century fiction. Is it merely a convention of the literary marketplace that certain books are labeled as “novel”, “romance” or “tale” – or do these terms point to formal and thematic features of the texts themselves? And how do these self-applied 18th century genre labels relate to the categories of contemporary criticism? Our use of the Eighteenth-Century Collections Online has given us access to a much wider range of texts than previously available, allowing to trace the emergence of fictional genres from the milieu of eighteenth-century writing. Our study of the shifts in meaning and function of the individual labels seeks to both question the “rise of the novel” narratives still current in literary history, and to problematize the idea of eighteenth-century “genres” of writing.
Modeling Dramatic Networks
Collaborators: Mark Algee-Hewitt, Zephyr Frank, Franco Moretti
This project arises from a systematic comparison of hundreds of dramatic networks from a dozen different literatures and historical periods. We develop generative models, built on four parameters regulating stagecraft, to produce thousands of virtual dramatic networks. These models allow us to identify fundamental properties of dramatic networks – with particular attention to the correspondence between genres and patterns of growth – and reflect on the relationship between modeling parameters and aesthetico-critical categories.
The Emotions of London
Collaborators: Mark Algee-Hewitt, Annalise Lockhart, Franco Moretti, Erik Steiner, Van Tran
Eighteenth- and nineteenth-century novels are notoriously brimming with emotions of all kinds. But where, exactly, do their characters feel anger, sadness, fear, surprise, and so on? Combining the resources of literary geography, and the potentialities of digital crowdsourcing, “The Emotions of London” has created an emotional map of the English metropolis, charting the affective significance of the thousands of place-names mentioned in eighteenth- and nineteenth-century novels. The project was published as “Mapping the Emotions of London in Fiction: A Crowdsourcing Experiment” in Literary Mapping in the Digital Age (2016). It is also forthcoming as Literary Lab Pamphlet 13.
Canon/Archive: Large-scale Dynamics in the Literary Field
Collaborators: Mark Algee-Hewitt, Sarah Allison, Marissa Gemma, Franco Moretti, Hannah Walser
Published as Literary Lab Pamphlet 11: “Of the novelties introduced by digitization in the study of literature, the size of the archive is probably the most dramatic: we used to work on a couple of hundred nineteenth-century novels, and now we can analyze thousands of them, tens of thousands, tomorrow hundreds of thousands. It’s a moment of euphoria, for quantitative literary history: like having a telescope that makes you see entirely new galaxies. And it’s a moment of truth: so, have the digital skies revealed anything that changes our knowledge of literature?”
On Paragraphs: Scale, Themes, and Narrative Form
Collaborators: Mark Algee-Hewitt, Franco Moretti
Published as Literary Lab Pamphlet 10: “Criticism has traditionally worked with the middle of the scale: a text, a scene, a stanza, an episode, an excerpt… An anthropocentric scale, where readers are truly ‘the measure of things.’ But the digital humanities, Alan Liu has written, have changed the coordinates of our work, by ‘focusing on microlevel linguistic features […] that map directly over macrolevel phenomena.’ Exactly. And how does one study literature, in this new situation?”
Style at the Scale of the Sentence
Collaborators: Sarah Allison, Marissa Gemma, Franco Moretti, Amir Tevel, Irena Yamboliev
Published as Literary Lab Pamphlet 5: “… But could the different frequencies of ‘she’ and ‘you’ and ‘the’ really be called ‘style’? On this, we disagreed. Some of us claimed that, though all styles do indeed entail linguistic choices, not all linguistic choices are however enough to speak of a style; others countered this argument by stating that, once an author or a genre opts for a certain linguistic choice, this is really all we need for our analysis, as a style follows necessarily from this fundamental level. This was the genuinely reductionist position – style as nothing but its components – and the more logically consistent one; the other position admitted that it couldn’t specify the exact difference, or the precise moment when a ‘linguistic choice’ turned into a ‘style,’ but it insisted nonetheless that reducing style to a strictly functional dimension missed the very point of the concept, which lay in its capacity to hint, however hazily, at something that went beyond functionality. Our job should consist in removing the haze, not in disregarding the hint.”
A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method
Collaborators: Long Le-Khac
Published as Literary Lab Pamphlet 4: “The nineteenth century in Britain saw tumultuous changes that reshaped the fabric of society and altered the course of modernization. It also saw the rise of the novel to the height of its cultural power as the most important literary form of the period. This paper reports on a long-term experiment in tracing such macroscopic changes in the novel during this crucial period. Specifically, we present findings on two interrelated transformations in novelistic language that reveal a systemic concretization in language and fundamental change in the social spaces of the novel. We show how these shifts have consequences for setting, characterization, and narration as well as implications for the responsiveness of the novel to the dramatic changes in British society. This paper has a second strand as well. This project was simultaneously an experiment in developing quantitative and computational methods for tracing changes in literary language. We wanted to see how far quantifiable features such as word usage could be pushed toward the investigation of literary history. Could we leverage quantitative methods in ways that respect the nuance and complexity we value in the humanities? To this end, we present a second set of results, the techniques and methodological lessons gained in the course of designing and running this project.”