The dot product is often presented as both an algebraic and a geometric operation. The relationship between these two ideas may not be immediately obvious. I prove that they are equivalent and explain why the relationship makes sense.
Published
26 June 2018
Two formulations
The dot product is an operation for multiplying two vectors to get a scalar value. Consider two vectors a=[a1,…,aN] and b=[b1,…,bN].1 Their dot product is denoted a⋅b, and it has two definitions, an algebraic definition and a geometric definition. The algebraic formulation is the sum of the elements after an element-wise multiplication of the two vectors:
a⋅b=a1b1+⋯+aNbN=n=1∑Nanbn.(1)
The geometric formulation is the length of a multiplied by the length of b times the cosine of the angle between the two vectors:
a⋅b=∥a∥∥b∥cosθ,(2)
where ∥v∥ denotes the length (two-norm) of the vector v. The geometric version can be easily visualized (Figure 1) since
cosθ=hypotenuse ∥a∥adjacent⟹∥a∥cosθ=adjacent.(3)
By the geometric definition, the dot product is the multiplication of the length of two vectors after one of the vectors (a in Figure 1) has been projected onto the other one (b in Figure 1).
Figure 1: The standard diagram of the dot product between vectors a and b. The implication is that the dot product is the multiplication of the lengths of both vectors after one vector has been projected onto the other.
But how are these definitions related? (This StackOverflow has asked the same question with this amazing diagram.) How is the area in Figure 1 related to a sum of element-wise products? The key is to realize that the summation in the algebraic definition is actually a linear projection. In this case, the dot product can be viewed as projecting a vector a∈RN by a matrix b∈R1×N. As we explore this idea, I hope the geometric definition makes more sense as well.
Proof of equivalence
Before talking about the dot product as a linear projection, let’s quickly prove that the two definitions of the dot product are equivalent. This is based on Heidi Burgiel’s proof. Let’s assume the algebraic definition in Equation 1. Then we want to prove Equation 2. Define c:=a−b (Figure 2). Now notice that
Here, we used the commutative and distributive properties of the dot product (see A1 for proofs of these properties). This looks a lot like the law of cosines for the same triangle (see A2 for a proof of this law). We can readily see these two equations equal to each other and simplify:
∥c∥2Equation 5∥a∥2+∥b∥2−2a⋅ba⋅b=∥c∥2=Law of cosines∥a∥2+∥b∥2−2∥a∥∥b∥cosθ⇓=∥a∥∥b∥cosθ.(5)
And we’re done. We have just proven that
n=1∑Nanbn=∥a∥∥b∥cosθ.(6)
However, this result is fairly counterintuitive to me, and my confusion as to how this could be true was the inspiration for this post. The geometric definition of the dot product is related to the law of cosines, which generalizes the Pythagorean theorem. But how is this definition related to the algebraic definition? Why are they both correct?
Figure 2: A geometric representation vector subtraction where c=a−b.
Matrices as projections
The key to understanding the algebraic dot product is to understand how a matrix M∈RP×N can be viewed as a linear transformation of N-dimensional vectors into a P-dimensional space. This is easiest to see with a 2×2 matrix, where we project a 2-dimensional vector into another 2-dimensional space. For example, consider the following vector v and linear transformation M:
[−7−513]M[12]v=[−51]Mv.(7)
Notice that the column vectors of M are actually the transformed standard basis vectors e1 and e2:
And the projected vector [−5,1] is just the original vector [1,2] using this new coordinate system specified by the transformed standard basis vectors, i.e.
[−51]=1[−75]Me1+2[13]Me2.(9)
We can visualize this (Figure 3) for some intuition. However, I highly recommend Grant Sanderson’s two beautiful explanations of linear projections and the dot product for interactive visualizations.
Figure 3:(Left) The vector [1,2] plotted in 2-dimensional space with the standard basis vectors e1=[1,0] and e2=[0,1].
(Right) The vector [1,2] transformed by the columns of M, i.e. with the transformed standard basis vectors Me1 and Me2.
We can write this more generally. Consider the fact that we can represent any vector v=[v1,…,vN] as a linear combination of the standard basis vectors e1,…,eN:
v=v1e1+⋯+vNeN.(10)
This means we can apply our linear transformation M to get
Mv=v1(Me1)+⋯+vN(MeN).(11)
So any P-vector Mv can be represented as a linear combination of the projected standard basis vectors Me1,…,MeN.
The dot product as a projection
We can think of the dot product as a matrix-vector multiplication where the left term is a 1×N matrix, i.e.,
This is why the dot product is sometimes denoted as a⊤b rather than as a⋅b. So what does a 1×N matrix represent as a transformation? It is a transformation of N-vectors into a 1-dimensional space, i.e. a line. So the dot product between vector a and b can be thought of as projecting a∈RN onto a number line defined by b∈RN (or vice versa). Our standard basis vectors are transformed such that they lie on a number line, and any vector projected into this new space must also lie on the number line. Each column in b is the 1-dimensional analog to the standard basis vectors that we used in RN.
Let’s see a concrete example in R2. Let a=[3,1] and b=[2,4]. Then the dot product between these two vectors can be written as a matrix-vector multiplication:
[31]⋅[24]=[31][24]=[10].(13)
If we think about the dot product as a projection from R2 to R, we can visualize the operation (Figure 4). The 1-vector [10] is represented as a linear combination of the columns of b=[3,1].
Figure 4: A vector b=[2,4] projected onto a number line defined by a vector a=[3,1].
As a sanity check, consider that points that are evenly spaced in R2 and then projected onto a will still be evenly spaced in R. We can see this quantitatively. As we linearly scale either vector, we linearly scale the resultant value a⋅b=α:
While this is not a complete proof, this is the hallmark of a linear relationship.
Conclusion
How are the two definitions of the dot product related? The intuition is that both can be viewed as geometric operations. There are a lot of other good explanations of this idea online, but for me, the one that really made the idea click is realizing that the sum represents a linear projection.
Acknowledgments
I thank Stephen Talabac for pointing out some typos.
Appendix
A1. Proofs of commutative and distributive properties
The dot product is commutative because scalar multiplication is commutative:
Note that the identity sin2(θ)+cos2(θ)=1 is just the Pythagorean theorem applied to a right triangle inscribed inside a unit circle. Therefore, we have proven the law of cosines for an arbitrary triangle.
This is an old post, and I was sloppy about the distinction between column and row vectors. I don’t want to remake the figures, but I think the distinction can always be inferred from context. ↩