Word Vectors in the Eighteenth Century, Episode 4: Semantic Networks

0. Introduction

This is a “slideshow essay.” To advance, use the arrow keys, or click the arrows that appear when hovering over this slide.

Last time on Word Vectors in the Eighteenth Century, we saw how semantic fields (i.e. meaningful groups of words) cluster together meaningfully in the semantic vector-space. But how can we turn the tables on that process, and instead find meaningful groups of words from their clustering in the vector space? In particular, I want to ask (selfishly, in the interests of my dissertation): how can we use word vectors to find meaningful groups of abstract words?

0. Introduction
1A. t-SNE or not t-SNE?

One way to find meaningful groups of words from their vector positions would be to visually explore them when plotted onto two dimensions. This is what the t-SNE dimensionality reduction algorithm does; its output on the most frequent 2,000 words in the corpus is displayed here. But, as we start exploring this graph of semantic distances between words, we immediately run into some problems…

1A. t-SNE or not t-SNE?
1B. Vector-distance is not necessarily semantic distance

The first problem is visualized here. Words are colored by part of speech: nouns in blue, verbs in orange, adjectives in green, and adverbs in red. As you can see, the vector-distances between words are strongly influenced by part of speech. Distance here is primarily grammatical or syntactic, and only secondarily semantic.

1B. Vector-distance is not necessarily semantic distance
1C. Vector-space of singular nouns

If we control for part-of-speech, and instead plot the vector-distances between the 2,000 most frequent singular nouns, distance becomes more semantic. Colored here by the vector of abstractness we made last time, V(Abstract-Concrete)—ranging from dark red (the most concrete words) to dark green (the most abstract words)—we can see how this semantic distinction is in large part responsible for the distances in the figure. Without knowing anything about our vector, the 2,000 most frequent singular nouns break down across a semantic spectrum from concreteness to abstractness.

1C. Vector-space of singular nouns
1D. Vector-space of abstract singular nouns

So, it’s definitely helpful to know that our vector of abstractness plays a big role in organizing the vector-positions of singular nouns. But, since I’m interested in finding meaningful groups of abstract words, here I’ve redrawn the t-SNE figure for the 2,000 most frequent abstract singular nouns, in order to allow their own semantic geography to emerge.¹ However, even though these distances are largely semantic and interesting, there are still problems with using these t-SNE visualizations to find groups of meaningful words…

1. A word is “abstract” if its vector points toward abstractness along V(Abstract-Concrete)—that is, its cosine similarity is greater than zero.

1D. Vector-space of abstract singular nouns
1E. Distance from “sensibility” in t-SNE

Take, for example, the word “sensibility.” In this figure, words are colored by their distance from “sensibility” in this graph—closest to farthest, red to blue. Labeled are some of the words surrounding “sensibility.” This all looks fine, except…

1E. Distance from “sensibility” in t-SNE
1F. Distance from “sensibility” in the vector-space

…if we recolor the words for their distance from “sensibility” in the vector-space, then a different set of related words emerge. The words “anguish,” “weakness,” “feeling,” “sentiment” are actually quite close to “sensibility” in the vector-space, but quite far from “sensibility” in this graph. Why? Because t-SNE has an impossible task: to compress the high dimensionality of the vector-space onto two dimensions.

1F. Distance from “sensibility” in the vector-space
2A. Why semantic networks?

This is one reason that semantic networks become interesting. Here is “sensibility” in a network, connected to the words close to it in the vector-space; every other word in the network is also connected to its vector-neighbors. Networks usefully approach distance from a “monadic” perspective, measuring distance relative to each node or word—which liberates them from t-SNE’s impossible task of projecting all the distances between the words onto a single two-dimensional plane. For example, unlike before, the words “vivacity” and “anguish” are now connected to sensibility; but at the same time, the fact that they appear in different regions of the network encodes the fact that these words are more typically connected to words in a different region of the semantic space.

2A. Why semantic networks?
2B. Semantic network of abstract singular nouns

For the remainder of this post, I’ll be talking about a single network. Its nodes are the most frequent 2,000 abstract singular nouns in the corpus.¹ The edges are the strongest 4,000 associations between those 2,000 words—that is, the shortest 4,000 word-to-word distances in the vector-space.² Finally, displayed here is only the “giant component” of the network: effectively, the largest island of connections in the network. We’ll focus on this big island, which has 1,432 nodes and 3,818 edges.

1. Eighteenth-Century Collections Online, “Literature and Language,” 1700-1799 (1.9 billion words).
2. After playing with many ways to make these networks (a particular cosine similarity cut-off, etc.), I think defining a semantic network’s “size” (# of edges) as a factor of its “order” (# of nodes) is an elegant and comparable way of making these networks. Although 4,000 connections may seem like a lot, it’s only 0.2% of the roughly 2 million connections that are possible between the 2,000 words, and the cosine similarities are never less than 0.5. 

2B. Semantic network of abstract singular nouns
2C. Network communities

Let’s return to our original question: how can we find meaningful groups of abstract words? Networks provide a solution: community detection. This is the network colored by community, where communities were demarcated by a “modularity” network algorithm. The algorithm is similar to k-means clustering, but it determines the number of clusters by trying to maximize intra-cluster edges over inter-cluster edges. We’ll start looking at and interpreting these communities in a moment.

2C. Network communities
2D. Zooming in

But first, let’s look at how semantic networks uniquely afford two ways of describing a word’s relationship to its community—which, for me, make networks more interesting than clustering algorithms as a framework for studying semantic groupings. For example, let’s zoom into this region of the network for a second, and focus on three network communities: in Miami blue-green, in maroon, and in light green.

2D. Zooming in
2E. “Hub” abstractions

Let’s look at the word “hatred.” In the network, it’s primarily connected to words within its network community—that is, within its blue-green cluster of other negative-aggressive affects, like “indignation,” “disgust,” and “contempt.” In a sense, then, “hatred” acts as a “hub” with respect to its community. As a hub, “hatred” stitches its own community together more than it connects that community to others. We can operationalize a word’s hub-ness as the number of intra-cluster edges it has, minus the number of its inter-cluster edges.

2E. “Hub” abstractions
2F. “Bridge” abstractions

By contrast, “passion” acts as a “bridge” with respect to its community. “Passion” bridges its community of negative affects (e.g. “resentment”) to a community of positive affects (e.g. “tenderness”). One can have “tenderness and passion”—or be “prone to anger, passion, and resentment.” The polysemy of “passion” pivots between, and bridges, these semantic contexts.

2F. “Bridge” abstractions
2G. Operationalizing bridge-ness as betweenness centrality

To measure the bridge-ness of a node, we can measure its betweenness centrality. The betweenness centrality (BC) of a node is how often it’s passed through on the (shortest) way between each pair of nodes. In this figure, node #4 has the highest BC: 15 shortest-paths between nodes pass through it. Why? Because it acts as a bridge between the community on the left of the graph, and the community on the right.

2G. Operationalizing bridge-ness as betweenness centrality
3A. Interpreting communities of abstract words

So, we’ve seen how networks allow us to represent vector-distances relative to each word, find communities of words, and locate words that act as hubs and bridges with respect to their community. With that information, let’s look now at the four most centrally-located communities of abstract words in the network. In this and the following graphs, nodes are sized by their betweenness centrality, and colored by their network community.

3A. Interpreting communities of abstract words
3B. Community #1: “Vice and Injustice”

With 150 words, the community I’ve labeled “Vice and Injustice” is the largest in the network. Its hubs (top five words by number of intra– minus inter-community edges) and bridges (top five words by betweenness centrality) are below.

Hubs: folly (28), impiety (25), perfidy (19), wickedness (17), debauchery (17)
Bridges: despotism (4.0% [of node-to-node shortest-paths pass through it]), injustice (3.8%), obstinacy (3.6%), folly (2.6%), weakness (2.3%)

As a hub, “folly” connects to 2 words outside its community, but 30 within it, including words from “cowardice” to “temerity,” “dullness” to “levity.” “Folly” and the other hubs reveal how this community is internally constituted as a general cluster of negatively-valued behaviors—or, effectively, “vices.” But, as bridges, “injustice,” “tyranny,” and “despotism” bring this cluster of vices into contact with a community I’ve labeled “Political Systems,” with words like “government,” “establishment,” and “power.”

3B. Community #1: “Vice and Injustice”
3C. “Vice and Injustice” and John Locke

Interestingly, articulating both moral-personal and social-political vices is exactly what Locke thought abstract words most accomplished. For Locke, human actions can be articulated only by abstract words—crucially, “without which, Laws could be but ill made, or Vice and Disorder repressed.” What does it mean, then, that the largest community of abstract nouns is a collection of behaviors that ought to be repressed? Is this a neo-empiricist demonstration of Locke’s empiricist theory of abstraction? Locke thought that, because abstract words are made arbitrarily by the mind, their existence to describe certain behaviors and not others reflects the degree to which a culture is invested in certain behaviors, and not others. To rephrase the question: what does it mean that the most invested-in and specified semantic community in this network articulates, even selects, behaviors to socially repress?

3C. “Vice and Injustice” and John Locke
3D. Community #2: “Ugly Feelings toward Other”

Hubs: hatred (20), resentment (17), reproach (14), aversion (12), rancour (10)
Bridges: partiality (4.6%), resentment (2.2%), passion (2.1%), censure (2.0%), prejudice (1.7%)

Moving slightly to the southeast, we stay within the semantics of negative value, but move from behavior to affect. I’ve labeled this community of 79 words “Ugly Feelings” after Sianne Ngai’s classic study in affect theory, Ugly Feelings (2005). But these words are not just ugly feelings: many of them direct that negative affect toward another person. “Resentment,” “hatred,” “reproach,” “prejudice,” are all quasi-affective states, but are also affective reactions to others’ behavior. “Partiality” and “passion” bridge this cluster to positive affects, connecting to words like “fondness” and “kindness.”

3D. Community #2: “Ugly Feelings toward Other”
3E. Community #3: “Ugly Feelings toward Self”

Hubs: anxiety (14), confusion (13), anguish (13), consternation (12), disturbance (11)
Bridges: anxiety (3.7%), tumult (2.1%), anguish (2.1%), disappointment (2.1%), rapture (2.1%)

Moving slightly east, we find another community of negative affects. How do these words form a distinct community? My guess is that these are self-directed negative affects. For Freud, anxiety is fear without an object. Unlike “hatred” or “resentment,” which as states of anger often have a person for their object, affects like “anxiety” and “anguish” are less specified in their direction. However, this non-specificity allows them to connect to other affective communities: as we’ll see later on, the bridge between “anxiety” and “tenderness” is structurally important to the overall network.

3E. Community #3: “Ugly Feelings toward Self”
3F. Why two communities of “Ugly Feelings”?

The cleaving of negative emotions into two separate communities raises an interesting question. Does this bifurcation arise from anything more than an inherent semantic contrast? For instance, might it also arise from contrasting gendered associations? As personified by Lady Louisa in Burney’s Evelina (1778), women were thought to have “weak nerves” in the period, experiencing seemingly unaccountable anxiety; Wollstonecraft critiqued women who “feign a sickly delicacy” in order to “gain the affections of a virtuous man.” We’ll return to this question later on.

3F. Why two communities of “Ugly Feelings”?
3G. Community #4: “Virtue and Sensibility”

Hubs: generosity (34), benevolence (31), probity (29), sincerity (27), humanity (27)
Bridges: sensibility (5.1%), tenderness (4.8%), ingenuity (4.0%), understanding (3.5%), wisdom (2.6%)

Moving to the northeast, we reach a community of 124 words I’ve labeled “Virtue and Sensibility.” It is, effectively, a huge cluster of virtues: “generosity,” “benevolence,” “civility,” “kindness,” “humanity,” etc. However, another community of “virtues” is just to the north of this one, with words like “propriety,” “regularity,” and “correctness.” The difference? The key word for this community here is “sensibility.” The buzz-word of the mid-to-late 18C, “sensibility” framed morality in terms of an affective sensitivity—a “moral sentiment” as Adam Smith would call it. So too do many of the words in this community. Tenderness, kindness, sincerity, goodness are not just moral behaviors, but behaviors with a shared affective overtone: what we might call the affective aura of sensibility.

3G. Community #4: “Virtue and Sensibility”
4A. The centrality of “sensibility” to the semantic network

In the overall semantic network, the word with the highest betweenness centrality—the word most necessary to pass through when traversing the entire network—is “sensibility.” This is quite remarkable, because immediately legible. In addition to conflating morality and sentiment, “sensibility” was polysemous in the period in other ways as well, meaning “power of sensation or perception”; “mental perception”; “emotional consciousness”; “quickness and acuteness of apprehension or feeling”; and (in the 18C and 19C) “capacity for refined emotion” (OED). It was also routinely parodied, critiqued, and associated with women and femininity.

4A. The centrality of “sensibility” to the semantic network
4B. Bridges from “sensibility”

In this and the subsequent networks, the edges are sized by the number of shortest-paths passing along them.

The fact that “sensibility” acts as the single most structurally important bridge for traversing the network is an index of its semantic and cultural polysemy. The association between “sensibility” and “understanding,” for instance, is a bridge often traveled when traversing the network; as is the association between “sensibility” and “weakness.” But why?

4B. Bridges from “sensibility”
4C. Paths through “sensibility”

This network was created from all the shortest-paths that pass through “sensibility.” The edges are, again, sized by the number of shortest-paths traversing them. From any given node in this graph, its path back up to “sensibility” at the center is its shortest path to “sensibility.”

Although this network was made from the shortest-paths passing through sensibility, not to it, we can’t tell from this figure where any given path continues onto after it passes through “sensibility.” Instead, what we can see are the ways in which whole communities of abstractions route themselves through “sensibility” toward some destination.

4C. Paths through “sensibility”
4D. Paths from “Morality and System”

If we highlight all the words from a community I’ve called “Morality and System,” we can see how a large cluster of them route themselves through “sensibility” by way of its association with “understanding.” These tools of Enlightenment thought (“reasoning”, “hypothesis,” “principle”, etc) are connected, at a distance, to the discourse of sensibility through the bridge provided by one of sensibility’s many senses, related to understanding: “acuteness of apprehension” (OED). In one passage from the corpus, a character is “certain that quick sensibility is inseparable from a ready understanding.” Inseparable, indeed: but they’re not only inseparable: their association is instrumental to the semantic network of eighteenth-century abstractions.

4D. Paths from “Morality and System”
4E. Paths from “Vice and Injustice”

Meanwhile, on the other side of the moral spectrum, we can see how a large cluster of words from the “Vice and Injustice” community route themselves through “sensibility” by way of its association with “weakness.” To me this suggests that the (betweenness) centrality of “sensibility” is not simply owing to its semantic polysemy. “Weakness,” after all, is not one of the meanings of “sensibility.” What we see instead is sensibility’s cultural polysemy, the way in which its moral valence toggled in the period. Through its association with “weakness”—particularly with the “weaker” sex—sensibility was as morally suspect as it was associated with “humanity” and “morality.” As Wollstonecraft writes, exposing that association: “despising that weak elegancy of mind, exquisite sensibility, and sweet docility of manners, supposed to be the sexual characteristics of the weaker vessel, I wish to show that elegance is inferior to virtue” (emphasis mine).

4E. Paths from “Vice and Injustice”
4F. Bridges from “anxiety”

Arguably, however, we already knew that the discourse of sensibility was a kind of meeting-point for a range of other cultural/semantic domains (emotions, moral behaviors, mental capacities, even Smithian economic models, etc). But this network’s operationalization of the concept of the “meeting-point” reveals the ways in which such a phenomenon was not at all unique to the discourse of sensibility. The word “anxiety,” for instance, is the ninth most central word in the network. Which paths through the network does it make possible?

4F. Bridges from “anxiety”
4G. Paths through “anxiety”

Looking at this network made up of the shortest-paths passing through “anxiety,” we can see how central the bridge between “anxiety” and “tenderness” is to the network. Through “tenderness,” a whole range of abstractions from the “Virtue and Sensibility” community reach “anxiety.” Likewise, a whole range of “Ugly Feelings toward Self” are routed to each other, and to “tenderness,” through “anxiety.” What are the literary manifestations of this bridge?

4G. Paths through “anxiety”
4H. “Tenderness” and “anxiety” as literary bridge

If we look at passages where tenderness and anxiety appear together, we tend to find passages like: “Emily gazed long on the plane-tree, and … at the remembrance of [Valancourt] … a mingled sensation of esteem, tenderness and anxiety rose in her breast” (Radcliffe, Mysteries of Udolpho, 1794).Mingled” sensations, in the late-century Gothic romance: paradoxical mixtures of tenderness—a sympathetic reaching-toward—and anxiety—a fearful reaching-back. Perhaps, then, the instrumental bridge between tenderness and anxiety in the network might reveal itself here as also literarily instrumental to the means by which the Gothic heroine comes to represent the affective paradoxes of late-century female subjectivity.

4H. “Tenderness” and “anxiety” as literary bridge
5A. Review

To review: we saw how semantic networks allow us to represent and explore the vector-distances between individual words. We used the vector between concreteness and abstractness developed in Episode 3 to find the 2,000 most frequent abstract singular nouns—singular nouns, because by controlling for part-of-speech, vector-distance is more likely to be semantic than syntactic. With these 2,000 words as nodes, we drew edges for the 4,000 strongest associations between them, and then used a community detection algorithm to find communities of abstract nouns in the network. Exploring these clusters as a network allowed us to see how some words, like “hatred,” act as hubs with respect to their community; others, like “passion,” as bridges to other communities. We interpreted some of these communities and the bridges between them.

5A. Review
5B. Discussion

What are semantic networks? Are they more than a convenient way of exploring semantic relationships in the vector-space? I think so. Networks capture something intuitive about the associative way in which we relate ideas—an associationism that, incidentally, derives historically from the empiricism of Berkeley and Hume. Two words, even if not quite synonyms of each other, need only be associated for them to be “stored” in a similar space in the brain, whether human or artificial. These networks, built from such associations, might be thought of as networks of “slant synonymy”: an apt metaphor, given that words are linked only if the angle between their semantic vectors is not too big. Our experiments here, with hubs and bridges and betweenness centrality, also bring out the way in which certain words might be central or crucial to that process of association—especially within specific syntactic-semantic domains like abstract singular nouns. Like word vectors themselves, the methodological framework of semantic networks makes visible historically-situated structures and configurations of words and meanings.

5B. Discussion
5C. Next time

Next time, on Word Vectors in the Eighteenth Century, we’ll move away from semantic networks to a different method of exploring word vectors, one inspired by David Hume’s style of argumentation through analogy. It will focus on correlating different vector-contrasts, which amounts to a method of, effectively, distant-reading, and discovering, analogies implicit within eighteenth-century literature. Until then… stay tuned!

5C. Next time

networkdiagram3

5 Comments

  1. Thank you for such detailed posts, I have found them really interesting. I have to say I really like the slideshow essay format.

    I’d be really grateful if you could explain how you calculate the shortest word-to-word distances in the vector space? I can’t get my head around how to go from a VSM, for example 2000 words each with 300 dimensions, to something that can be used as an edge list.
    Thank you

    • Ryan Heuser

      Hi Sara! Thanks for the kind words. What I did was to find the cosine distance between each pair of words in the set of 2,000 words. Then, I took the 4,000 shortest cosine distances. So, in gensim, you can do this (code as a github gist). Anyway, hope this helps! Let me know if this makes sense. And thanks again for your comments!

  2. I’m curious, Ryan. How did you come across the notion of semantic nets and what did it mean in that context?

    Out of curiosity I did an Ngram query on “cognitive network,semantic network”. “Cognitive network” gets minor action. But “semantic network” started on the rise at about 1970, peaked in the late 1980s (when WordNet came out), and then dropped back to late 70s level by about 2000, where it is now.

    I encountered the idea back in the mid-1970s when it meant two things: 1) a model of the mind with roots in associationist psychology going back to Locke, and 2) some flavor of a model implemented in code. Many interested primarily in the first used the second as a tool for investigation, but you could do the second as pure engineering, without much concern for the mind. I got my degree in English at SUNY Buffalo and was also part of David Hays’s (Linguistics) computational linguistics research group, though I didn’t do any coding (nor did Hays for that matter). It’s in that context that I immersed myself in semantic nets.

    What you’re doing here, of course, is quite different. But off there, way way out there on the horizon, I wonder? It’s long been clear to me that the next BIG STEP (in whatever you want to call it) will be to hook up current machine learning/statistical techniques with the old hand-coded models. Don’t know how that would be done or how far it’ll get us. As you know there are people who dream of a full-on AGI (artificial general intelligence), I’m skeptical for various reasons. More immediately though, there’s the business of analyzing your empirically derived nets and making sense of them. Fascinating stuff.

Leave a Reply

Your email address will not be published. Required fields are marked *