Word Vectors in the Eighteenth Century, Episode 3: From Fields to Vectors

0. Preface
0. Preface

Unlike the previous two posts in the series, this post is presented in slideshow form. It’s an attempt to reign in my hobby-horse of verbosity; to accommodate our shrinking attention spans; and (more somberly) to experiment with DH-inspired forms of visual rhetoric. To advance, push the right arrow key, or click the right arrow that appears when you hover over this text. To expand an image, click the title of the slide, which should appear in blue.

In any case: ahoy! …

0. Preface
1. Introduction
1. Introduction

Last time on Word Vectors in the Eighteenth Century, we looked at how word vectors work under the hood. And before that, we close-read how word vectors might model Edward Young’s influential analogy that “riches are to virtue as learning is to genus.”

But how can we distant-read word vectors? Surprisingly, this is not an easy question. Unlike topic modeling and other unsupervised methods, it’s not immediately clear how to use word vectors for large-scale text analysis. All word vectors give us is a multidimensional semantic space, into there’s no one particular or privileged way to enter. The matrix, as it were, won’t tell us what questions to ask it.

1. Introduction
2A. Back to semantic fields

Anyway, I didn’t know where to begin. So I thought I’d piggy-back on work that Long Le-Khac and I have done in the past on semantic fields, or “cohorts.”

For us, a semantic cohort is a group of words that are both semantically similar (i.e. a semantic field); and historically similar, in that they rise or fall together across time (i.e. an historical cohort). I won’t go here into how we made these semantic fields/cohorts, but if you’re interested, check out the pamphlet.

Instead, this post asks the questions: How do semantic fields relate to semantic vectors? Do vector-based approaches to semantics corroborate field-based approaches? More importantly, how can vectors help us think in new ways about semantics in DH?

2A. Back to semantic fields
2B. Semantic cohort #1: “Abstract Values”

Long and I believe we discovered two large semantic cohorts. The first we called the “abstract values.” It contained words like the following, categorized into sub-fields:

— Moral Valuation: character, honour, conduct, respect, worthy …
— Social Restraint: gentle, pride, proud, proper, agreeable, …
— Sentiment: heart, feeling, passion, bosom, emotion, …
— Partiality: correct, prejudice, partial, disinterested, …

Note that most of these words are Latinate abstractions, and that as a cohort they fall in frequency across the nineteenth century.

Click the blue title above to see the graph more clearly.

2B. Semantic cohort #1: “Abstract Values”
2C. Semantic cohort #2: “Hard Seed”

Our second semantic cohort we called “hard seed,” after its seed word, “hard.” It contained words like the following, categorized into sub-fields:

— Action Verbs: see, come, go, came, look, let, looked, …
— Body Parts: eyes, hand, face, head, hands, eye, arms, …
— Physical Adjectives: round, hard, low, clear, heavy, hot, straight, …
— Colors: white, black, red, blue, green, gold, grey, …
— Locative Prepositions: out, up, over, down, away, back, through, …
— Numbers: two, three, ten, thousand, four, five, hundred, …

Note that most of these words are concrete and Anglo-Saxon, and that they rise in frequency across the nineteenth century.

2C. Semantic cohort #2: “Hard Seed”
3A. Semantic fields in vector space

So, what would happen if we located each semantic field’s words in the vector space? Would words from the same field appear closer to each other than to words from other fields?

3A. Semantic fields in vector space
3B. Semantic fields in vector space: All fields

It appears so. Words here are colored by their semantic field. If they are closer together in this image, then they are closer together in the vector space. The image is made by a t-sne dimensionality reduction of the cosine distances between each word (where the distances come from the ECCO-TCP word2vec model). In effect, t-sne tries to flatten the multidimensional geography of the data onto two dimensions with as little information loss as possible.

3B. Semantic fields in vector space: All fields
3C. Semantic fields in vector space: Hard Seed

The separation between the semantic fields is even more obvious if we look only at the words in each large field separately: displayed here is only the “hard seed” field. As a whole, “hard seed” tends to occupy the southwestern quadrant of the graph, and not to occupy the northeastern. Moreover, its sub-fields occupy their own distinct regions of the vector space: Action Verbs in purple, Body Parts in brown, Colors in pink, Numbers in gray, Physical Adjectives in yellow, and Locative Prepositions in light blue.

3C. Semantic fields in vector space: Hard Seed
3D. Semantic cohorts in vector space: Abstract Values

Conversely, almost all of the “abstract values” occupy the northeastern quadrant of the graph. Its sub-fields, however, are less tightly organized than in “hard seed”: Social Restraint (red) and Moral Valuation (blue) are stretched together across the northeast, intermixing also with the Sentiment field (green).

3D. Semantic cohorts in vector space: Abstract Values
3E. Semantic cohorts in vector space: Abstract Values [Zoom]

On closer inspection, however, it’s clear that the model has reorganized the “abstract values” according to its own logic. In general, the more easterly words here are positively valued (faultless, refined, admiration) and the westerly ones negatively valued (vulgar, sinful, reckless). This reorganization is one reason the fields look mixed-up: the distinction between abstractions of social vs. moral behavior has been subordinated to the distinction between positive vs. negative abstractions.

3E. Semantic cohorts in vector space: Abstract Values [Zoom]
4A. From fields…
4A. From fields…

So, semantic vectors corroborate semantic fields, while also nuancing them. Within a vector space of semantics, words from the same “semantic field”—made from a totally different and independent process—cluster together in meaningful ways.

But corroboration is a bittersweet moment in DH: impressive, empowering even, but unsatisfying, boring. How, then, can word vectors allow us to approach these semantic questions differently? How can they help us ask new questions?

Perhaps we could think less in terms of discrete semantic units, like semantic fields or cohorts…

4A. From fields…
4B. To vectors
4B. To vectors

…and instead, we could think more in terms of vectors or axes of meaning.

For example, instead of thinking of concrete and abstract words as belonging to distinct semantic fields, we could think of them as lying at the extreme ends of a semantic spectrum—a vector—that points from one to the other, or from the semantics of concreteness to the semantics of abstractness.

Vectors make it easy to define this new kind of semantic unit: the semantic vector, V(Abstract-Concrete). But how?

4B. To vectors
4C. Defining V(Abstract) and V(Concrete)
4C. Defining V(Abstract) and V(Concrete)

One way would be to build on the abstract and concrete semantic fields we’ve been looking at.

We could define a generalized abstract word, V(Abstract), as the centroid of the vector positions for all words in the “abstract values” field. Because what words in this field most share is their abstractness, we would expect an artificial word vector pointing there [i.e. V(Abstract)] to primarily capture the semantics of abstractness.

Likewise, we can define a generalized concrete word, V(Concrete), as the centroid of the vector positions for all words in the “hard seed” fields, since what these words most share is their concreteness.

4C. Defining V(Abstract) and V(Concrete)
4D. Defining V(Abstract-Concrete)
4D. Defining V(Abstract-Concrete)

Finally, we can define a semantic vector pointing from the concrete to the abstract as V(Abstract-Concrete). This is the vector subtraction of V(Concrete) from V(Abstract). By the logic of subtraction, this vector points from the semantics only concrete words have to the semantics only abstract words have.

In effect, V(Abstract-Concrete) expresses the difference between concrete and abstract words, not as two distinct semantic fields, but rather as a single semantic axis of difference: that is, as a vector.

4D. Defining V(Abstract-Concrete)
5A. Measuring abstractness everywhere

Our vector of concreteness-vs.-abstractness, V(Abstract-Concrete), affords a whole range of interesting distant readings. For instance, now we can measure the relative abstractness of any word by taking the cosine similarity between its vector and V(Abstract-Concrete). If above 0, the word points toward abstractness; if below, it points toward concreteness; and if around 0, it points orthogonally, neutral with respect to the contrast. Here are the most frequent 1,000 words in the corpus by part-of-speech. That there are more abstract than concrete adjectives, and more concrete than abstract verbs, is not an artifact of our vector, but reappears in contemporary measures of linguistic abstractness.

5A. Measuring abstractness everywhere
5B. Comparing to contemporary measures of abstractness

We can also compare V(Abstract-Concrete), our measure of abstractness specific to eighteenth-century semantics, with contemporary measures of abstractness. The y-axis here is V(Abstract-Concrete). Along the x-axis here is a contemporary measure of concreteness, drawn from a Mechanical Turk study (Brysbaert et al). As we expect, they negatively correlate: the linear regression explains about a third of the data (R^2 = 32%). But the variations from the norm are even more interesting…

5B. Comparing to contemporary measures of abstractness
5C. Concrete words can sublimate

Take the word “discovered.” The word is more abstract today, and more concrete in the eighteenth century, than we would expect from the linear model. Why is this? This may have a simple historical-linguistic explanation. The concrete usage of “discover” (to un-cover and make visible to the eye), now marked “rare” by the OED, was common in the 18C. As a random example, from Burney’s Evelina (1778): “Just then our attention was attracted by a pine-apple; which, suddenly opening, discovered a nest of birds, which immediately began to sing.”

5C. Concrete words can sublimate
5D. Abstract words can ossify

Conversely, the word “human” is highly concrete today (almost maximally), but was highly abstract in the 18C—both much more so than we would expect. Why? My guess is that today we think of a human more than the human: I’m a human, you’re a human, this human, that human: it’s human with an indefinite article, it’s plural, it’s concrete. But in the 18C, “human” operated as a sacred, top-level abstraction: as in human nature, or the contrast between the human and animal worlds, or between the human and the divine.

Is “human” an abstraction, then? In the 18C, yes; in the 21C, no. Results such as these provoke further questions. For example, is abstraction best understood along a timeline? Just as metaphors “die”, hardening into a new literal meaning, perhaps abstractions also pass away, drifting into concrete meanings.

5D. Abstract words can ossify
6. Conclusion
6. Conclusion

In sum, we saw how semantic fields appear clustered together in a semantic vector-space in interesting ways: and so vector-based approaches to semantics corroborate, even nuance, field-based ones. But they also build on and reframe them: by redefining the relationship between abstract and concrete words as the semantic vector between them, we can measure the relative abstractness of any given word. With this measure, we can do any number of things. Here, we compared it with a contemporary measure of concreteness, and interpreted a couple outliers.

Next time, on Word Vectors in the Eighteenth Century, we’ll make use of this vector-based measure of abstractness to construct “semantic transportation networks” between abstract nouns. Until then, … stay tuned!

6. Conclusion

One Comment

  1. Interesting stuff, Ryan. You might be amused at the following brief email exchange I had with the late Walter Freeman, a Berkeley neuroscientist who routinely worked with high-dimensional spaces:

    Walter,

    I’ve had another crazy idea. I’ve been thinking about (Herman) Haken’s remark that the trick to dealing with dynamical systems is to find phenomena of low dimensionality in them. What I think is that that is what poetic form does for language. The meaning of any reasonable hunk of language is a trajectory in a space of very high dimensionality. Poetic form “carves out” a few dimensions of that space and makes them “sharable” so that “I” and “Thou” can meet in aesthetic contemplation.

    So, what does this mean? One standard analytic technique is to discover binary oppositions in the text and see how they are treated. In KK Coleridge has a pile of them, human vs. natural, male vs. female, auditory vs. visual, expressive vs. volitional, etc. So, I’m thinking of making a table with one column for each line of the poem and then other columns for each of these “induced” dimensions. I then score the content of each line on each dimension, say +, – and 0. That set of scores, taken in order from first to last line, is the poem’s trajectory through a low dimensional projection or compression of the brain’s state space.

    The trick, of course, is to pull those dimensions out of the EEG data. Having a sound recording of the reading might be useful. What happens if you use the amplitude envelope of the sound recording to “filter” the EEG data?

    Later,

    Bill B

    Not crazy, Bill, but technologically challenging!
    Will keep on file and get back to you.

    Walter

Leave a Reply

Your email address will not be published. Required fields are marked *