Ongoing Projects

Word Vectors in the Eighteenth-Century

Word embedding models, which represent semantic relationships between words as "vectors," have exploded recently in the Machine Learning and Natural Language Processing communities, in part because these new models can represent and predict semantic relationships as complex as analogy. "Man is to woman as king is to queen" is represented as a mathematical formula obtaining between word vectors: V(Queen) ≈ V(King) - V(Man) +V(Woman). Or, "riches are to virtue as learning is to genius"-as Edward Young argued in his 1759 treatise against classical imitation-is represented as _V(Genius) ≈ V(Learning) - V(Riches) + V(Virtue). _This project explores the possibilities that such vector-based representations of semantic relationships have as analytical tools for a conceptually-oriented distant-reading of eighteenth-century literature.

Virtual Poetics

This project is an attempt at arguing that virtual literature-texts made by computers-can productively reframe our understanding of actual literature. This project actually began tongue-in-cheek, producing the playful exercise of my first blog post on a "found" virtual poem. But the more I thought about virtual texts, the more seriously I found myself considering what they might have to tell us. In particular, I think they raise some interesting questions for the theory and methodology of DH.

Historical Prosody Project

_Collaborators: Mark Algee-Hewitt, J.D. Porter, Jonathan Sensenbaugh, Justin Tackett _What is the history of meter in English poetry? Computational models of metrical scansion allows us to explore this question with a greater nuance at a larger scale than previously. Building on linguistic models of metrical discourse drawn from the field of generative metrics, we develop a computational method for identifying poetic meter, enabling us to move beyond generalizations and into the vast archive of poetic history. By running our metrical parser on a sample corpus of hand-scanned poems from a large corpus of poetry, we were able to refine our algorithm to the point that we felt confident in its accuracy and predictive capability. We then set out to run our parser on a much larger collection of poems, upon which the interpretive results of our study are based.

Completed Projects

The Emotions of London

Collaborators: Mark Algee-Hewitt, Annalise Lockhart, Franco Moretti, Erik Steiner, Van Tran

Eighteenth- and nineteenth-century novels are notoriously brimming with emotions of all kinds. But where, exactly, do their characters feel anger, sadness, fear, surprise, and so on? Combining the resources of literary geography, and the potentialities of digital crowdsourcing, "The Emotions of London" has created an emotional map of the English metropolis, charting the affective significance of the thousands of place-names mentioned in eighteenth- and nineteenth-century novels. The project was published as "Mapping the Emotions of London in Fiction: A Crowdsourcing Experiment" in Literary Mapping in the Digital Age (2016). It is also forthcoming as Literary Lab Pamphlet 13.

Canon/Archive: Large-scale Dynamics in the Literary Field

Collaborators: Mark Algee-Hewitt, Sarah Allison, Marissa Gemma, Franco Moretti, Hannah Walser

Published as Literary Lab Pamphlet 11: "Of the novelties introduced by digitization in the study of literature, the size of the archive is probably the most dramatic: we used to work on a couple of hundred nineteenth-century novels, and now we can analyze thousands of them, tens of thousands, tomorrow hundreds of thousands. It's a moment of euphoria, for quantitative literary history: like having a telescope that makes you see entirely new galaxies. And it's a moment of truth: so, have the digital skies revealed anything that changes our knowledge of literature?"

On Paragraphs: Scale, Themes, and Narrative Form

Collaborators: Mark Algee-Hewitt, Franco Moretti

Published as Literary Lab Pamphlet 10: "Criticism has traditionally worked with the middle of the scale: a text, a scene, a stanza, an episode, an excerpt... An anthropocentric scale, where readers are truly 'the measure of things.' But the digital humanities, Alan Liu has written, have changed the coordinates of our work, by 'focusing on microlevel linguistic features [...] that map directly over macrolevel phenomena.' Exactly. And how does one study literature, in this new situation?"

Style at the Scale of the Sentence

Collaborators: Sarah Allison, Marissa Gemma, Franco Moretti, Amir Tevel, Irena Yamboliev

Published as Literary Lab Pamphlet 5: "... But could the different frequencies of 'she' and 'you' and 'the' really be called 'style'? On this, we disagreed. Some of us claimed that, though all styles do indeed entail linguistic choices, not all linguistic choices are however enough to speak of a style; others countered this argument by stating that, once an author or a genre opts for a certain linguistic choice, this is really all we need for our analysis, as a style follows necessarily from this fundamental level. This was the genuinely reductionist position - style as nothing but its components - and the more logically consistent one; the other position admitted that it couldn't specify the exact difference, or the precise moment when a 'linguistic choice' turned into a 'style,' but it insisted nonetheless that reducing style to a strictly functional dimension missed the very point of the concept, which lay in its capacity to hint, however hazily, at something that went beyond functionality. Our job should consist in removing the haze, not in disregarding the hint."

A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method

Collaborators: Long Le-Khac

Published as Literary Lab Pamphlet 4: "The nineteenth century in Britain saw tumultuous changes that reshaped the fabric of society and altered the course of modernization. It also saw the rise of the novel to the height of its cultural power as the most important literary form of the period. This paper reports on a long-term experiment in tracing such macroscopic changes in the novel during this crucial period. Specifically, we present findings on two interrelated transformations in novelistic language that reveal a systemic concretization in language and fundamental change in the social spaces of the novel. We show how these shifts have consequences for setting, characterization, and narration as well as implications for the responsiveness of the novel to the dramatic changes in British society. This paper has a second strand as well. This project was simultaneously an experiment in developing quantitative and computational methods for tracing changes in literary language. We wanted to see how far quantifiable features such as word usage could be pushed toward the investigation of literary history. Could we leverage quantitative methods in ways that respect the nuance and complexity we value in the humanities? To this end, we present a second set of results, the techniques and methodological lessons gained in the course of designing and running this project."