Polysemous words “collapse” into specific meanings through different semantic contexts

5 min readFeb 19, 2025

Posted on February 1, 2025 by rayhu

Stanford professor Christopher Manning, a renowned researcher in natural language processing (NLP), once suggested a peculiar way of expressing word vectors. When a word has more than one meaning, it is expressed as a linear superposition of multiple vectors. Each vector represents a unique meaning in a context.

For example, “pike” can be either a weapon, a fish, or a toll, so its mathematical expression, its vector, is a weighted sum of these three different meanings. This can be used for computing.

The different alpha’s depend on the frequency of each meaning in the corpus. So, a word can appear many times in multi-dimensional space, where each occurrence, representing a unique meaning, is then clustered with associated words to form a Cluster.

The projection looks like the following diagram in two dimensions. For example, Jaguar appears three times, one for automobile, one for computer hardware, and one for computer software. Each of these occurrences is co-occurring with a word related to its context so that we can mathematically determine from the context exactly what the word means in this context.

This approach is not the current mainstream Word Embedding, which aggregates multiple meanings of the same word, puts them in the same multidimensional spatial location, and determines their specific meanings by projecting them against different combinations of dimensions. However, Manning’s method has some interesting properties.

First, its formulation appears to be dissimilar to that of particle states in quantum mechanics, where the position of an electron can be described as a linear superposition of different states, such as probability waves of electrons at different positions:

Accordingly, the word vector and quantum mechanics have the same Observation and Collapse phenomena:

In word vectors, when you use “pike” in context, you “extract” a specific meaning from the superposition of states of the word vector. For example:

- “He caught a pike” → here “pike” may refer to a fish 🐟

- “He stepped on a pike” → here “pike” may refer to a spear 🏹

In quantum mechanics, the position of an electron only “collapses” to a definite position when it is measured, whereas, before the measurement, it is a superposition of several possible positions.

Again, the similarity between Sparse Coding and the measurement problem:

- In word vectors, the idea of Sparse Coding suggests that the different meanings of a word can be decomposed in the same way, as if a particular meaning had been obtained after measurement.

In Quantum Measurement, Heisenberg’s Principle of Inappropriateness of Measurement shows that a particle's position and momentum cannot be measured precisely at the same time and that the relationship between them is analogous to the trade-off between the different meanings in word vectors (trade-off).

The difference:

Word vectors are probabilistic but observable, whereas quantum states are probabilistic but not directly observable:

In NLP (Natural Language Processing), we can compute word vectors and use dimensionality reduction or clustering to find different meanings.

In quantum mechanics, a particle's state can only be accessed by measurement, which affects its original state.

Although there is a mathematical similarity between this word vector superposition and the quantum state superposition in quantum mechanics. But the word vector superposition is based on statistical relationships in large-scale textual data, whereas the quantum superposition is based on probabilistic descriptions of the laws of physics. Nevertheless, this analogy helps to inspire linguists and computer scientists to create brand-new mathematical methods to achieve remarkable scientific results.

The choice of mathematical expression often determines whether a fundamental breakthrough can be achieved in a field. From physics to natural language processing (NLP), many significant advances have resulted from finding more precise and generalized mathematical tools to describe complex phenomena.

Richard Feynman’s Path Integral revolutionized the understanding of Quantum Electrodynamics (QED). Feynman discovered that the motion of an electron can be described as the sum of all possible paths from one point to another, each with a corresponding amplitude. This mathematical framework not only simplified calculations, but also gave quantum mechanics a more intuitive geometric interpretation, laying the groundwork for what would become the Standard Model.

Similarly, in NLP, finding the right mathematical expression is crucial, as exemplified by Tomas Mikolov’s 2013 Word2Vec, in which Mikolov and his team at Google Brain devised a highly efficient neural network model that allows words to be embedded in high-dimensional vector spaces (Word Mikolov and his team at Google Brain designed an efficient neural network model that allows words to be embedded in high-dimensional vector spaces (WordEmbeddings), where semantically similar words are mapped to similar locations. This discovery revolutionized the field of NLP, allowing computers to measure relationships between words based on geometric distances, rather than relying on traditional manual rules or statistical methods. What’s more, Mikolov also discovered semantic laws for vector operations, for example:

This phenomenon showed that simple mathematical operations could express relationships between words, laying the foundation for the later development of NLP in the deep learning era. How magic it is!

This mathematical idea is also reflected in Christopher Manning’s proposal for the linear superposition of Word Senses. Just as Feynman’s path integral method allows for the superposition of all possible paths, Manning argued that a word's different meanings can be represented as a weighted sum of multiple vectors, which naturally “collapses” to the correct meaning in different contexts. This approach is not only theoretically elegant but also shows stronger performance in Context-Aware NLP Tasks.

Math is not just a tool. It is a way of thinking. Whether it’s parsing the motion of electrons in physics or capturing the structure of a language in computer science, finding the proper mathematical expression often means having a deeper understanding of the world. In the future, in AI and NLP, we may be able to borrow more mathematical frameworks from physics, such as Group Theory, Topology, or Measure Theory, to further advance the evolution of AI language processing capabilities.

Polysemous words “collapse” into specific meanings through different semantic contexts

Written by Ray Hu

No responses yet