Two Forms of the Dot Product

The dot product is often presented as both an algebraic and a geometric operation. The relationship between these two ideas may not be immediately obvious. I prove that they are equivalent and explain why the relationship makes sense.

Two formulations

The dot product is an operation for multiplying two vectors to get a scalar value. Consider two vectors a=[a1,,aN]\mathbf{a} = [ a_1, \dots, a_N ] and b=[b1,,bN]\mathbf{b} = [ b_1, \dots, b_N ].1 Their dot product is denoted ab\mathbf{a} \cdot \mathbf{b}, and it has two definitions, an algebraic definition and a geometric definition. The algebraic formulation is the sum of the elements after an element-wise multiplication of the two vectors:

ab=a1b1++aNbN=n=1Nanbn.(1) \mathbf{a} \cdot \mathbf{b} = a_1 b_1 + \dots + a_N b_N = \sum_{n=1}^{N} a_n b_n. \tag{1}

The geometric formulation is the length of a\mathbf{a} multiplied by the length of b\mathbf{b} times the cosine of the angle between the two vectors:

ab=abcosθ,(2) \mathbf{a} \cdot \mathbf{b} = \lVert \mathbf{a} \rVert \lVert \mathbf{b} \rVert \cos \theta, \tag{2}

where v\lVert\mathbf{v}\rVert denotes the length (two-norm) of the vector v\mathbf{v}. The geometric version can be easily visualized (Figure 11) since

cosθ=adjacenthypotenuse a    acosθ=adjacent.(3) \cos \theta = \frac{\text{adjacent}}{\text{hypotenuse $\lVert \mathbf{a} \rVert$}} \implies \lVert\mathbf{a}\rVert \cos \theta = \text{adjacent}. \tag{3}

By the geometric definition, the dot product is the multiplication of the length of two vectors after one of the vectors (a\mathbf{a} in Figure 11) has been projected onto the other one (b\mathbf{b} in Figure 11).

Figure 1: The standard diagram of the dot product between vectors a\mathbf{a} and b\mathbf{b}. The implication is that the dot product is the multiplication of the lengths of both vectors after one vector has been projected onto the other.

But how are these definitions related? (This StackOverflow has asked the same question with this amazing diagram.) How is the area in Figure 11 related to a sum of element-wise products? The key is to realize that the summation in the algebraic definition is actually a linear projection. In this case, the dot product can be viewed as projecting a vector aRN\mathbf{a} \in \mathbb{R}^{N} by a matrix bR1×N\mathbf{b} \in \mathbb{R}^{1 \times N}. As we explore this idea, I hope the geometric definition makes more sense as well.

Proof of equivalence

Before talking about the dot product as a linear projection, let’s quickly prove that the two definitions of the dot product are equivalent. This is based on Heidi Burgiel’s proof. Let’s assume the algebraic definition in Equation 11. Then we want to prove Equation 22. Define c:=ab\mathbf{c} := \mathbf{a} - \mathbf{b} (Figure 22). Now notice that

c2=cc=(ab)(ab)=aabaab+bb=a2+b22(ab).(4) \begin{aligned} \lVert \mathbf{c} \rVert^2 &= \mathbf{c} \cdot \mathbf{c} \\ &= (\mathbf{a} - \mathbf{b}) \cdot (\mathbf{a} - \mathbf{b}) \\ &= \mathbf{a} \cdot \mathbf{a} - \mathbf{b} \cdot \mathbf{a} - \mathbf{a} \cdot \mathbf{b} + \mathbf{b} \cdot \mathbf{b} \\ &= \lVert \mathbf{a} \rVert^2 + \lVert \mathbf{b} \rVert^2 - 2 (\mathbf{a} \cdot \mathbf{b}). \end{aligned} \tag{4}

Here, we used the commutative and distributive properties of the dot product (see A1 for proofs of these properties). This looks a lot like the law of cosines for the same triangle (see A2 for a proof of this law). We can readily see these two equations equal to each other and simplify:

c2=c2a2+b22abEquation 5=a2+b22abcosθLaw of cosinesab=abcosθ.(5) \begin{aligned} \lVert \mathbf{c} \rVert^2 &= \lVert\mathbf{c}\rVert^2 \\ \underbrace{\lVert\mathbf{a}\rVert^2 + \lVert\mathbf{b}\rVert^2 - 2 \mathbf{a} \cdot \mathbf{b}}_{\text{Equation $5$}} &= \underbrace{\lVert\mathbf{a}\rVert^2 + \lVert\mathbf{b}\rVert^2 - 2 \lVert\mathbf{a}\rVert \lVert\mathbf{b}\rVert \cos \theta}_{\text{Law of cosines}} \\ &\Downarrow \\ \mathbf{a} \cdot \mathbf{b} &= \lVert\mathbf{a}\rVert \lVert\mathbf{b}\rVert \cos \theta. \end{aligned} \tag{5}

And we’re done. We have just proven that

n=1Nanbn=abcosθ.(6) \sum_{n=1}^{N} a_n b_n = \lVert\mathbf{a}\rVert \lVert\mathbf{b}\rVert \cos \theta. \tag{6}

However, this result is fairly counterintuitive to me, and my confusion as to how this could be true was the inspiration for this post. The geometric definition of the dot product is related to the law of cosines, which generalizes the Pythagorean theorem. But how is this definition related to the algebraic definition? Why are they both correct?

Figure 2: A geometric representation vector subtraction where c=ab\mathbf{c} = \mathbf{a} - \mathbf{b}.

Matrices as projections

The key to understanding the algebraic dot product is to understand how a matrix MRP×N\mathbf{M} \in \mathbb{R}^{P \times N} can be viewed as a linear transformation of NN-dimensional vectors into a PP-dimensional space. This is easiest to see with a 2×22 \times 2 matrix, where we project a 22-dimensional vector into another 22-dimensional space. For example, consider the following vector v\mathbf{v} and linear transformation M\mathbf{M}:

[7153]M[12]v=[51]Mv.(7) \overbrace{\begin{bmatrix} \textcolor{#bc2612}{-7} & \textcolor{#11accd}{1} \\ \textcolor{#bc2612}{-5} & \textcolor{#11accd}{3} \end{bmatrix}}^{\mathbf{M}} \overbrace{\begin{bmatrix} 1 \\ 2 \end{bmatrix}}^{\mathbf{v}} = \overbrace{\begin{bmatrix} -5 \\ 1 \end{bmatrix}}^{\mathbf{Mv}}. \tag{7}

Notice that the column vectors of M\mathbf{M} are actually the transformed standard basis vectors e1\mathbf{e}_1 and e2\mathbf{e}_2:

[7153][10]e1=[75],[7153][01]e2=[13].(8) \begin{bmatrix} \textcolor{#bc2612}{-7} & \textcolor{#11accd}{1} \\ \textcolor{#bc2612}{-5} & \textcolor{#11accd}{3} \end{bmatrix} \overbrace{\vphantom{\Bigg|} \begin{bmatrix} 1 \\ 0 \end{bmatrix} }^{\mathbf{e}_1} = \begin{bmatrix} \textcolor{#bc2612}{-7} \\ \textcolor{#bc2612}{-5} \end{bmatrix}, \qquad \begin{bmatrix} \textcolor{#bc2612}{-7} & \textcolor{#11accd}{1} \\ \textcolor{#bc2612}{-5} & \textcolor{#11accd}{3} \end{bmatrix} \overbrace{\vphantom{\Bigg|} \begin{bmatrix} 0 \\ 1 \end{bmatrix} }^{\mathbf{e}_2} = \begin{bmatrix} \textcolor{#11accd}{1} \\ \textcolor{#11accd}{3} \end{bmatrix}. \tag{8}

And the projected vector [5,1][ -5, 1 ] is just the original vector [1,2][ 1, 2 ] using this new coordinate system specified by the transformed standard basis vectors, i.e.

[51]=1[75]Me1+2[13]Me2.(9) \begin{bmatrix} -5 \\ 1 \end{bmatrix} = 1 \overbrace{\begin{bmatrix} \textcolor{#bc2612}{-7} \\ \textcolor{#bc2612}{5} \end{bmatrix}}^{\mathbf{M} \mathbf{e}_1 } +2 \overbrace{\begin{bmatrix} \textcolor{#11accd}{1} \\ \textcolor{#11accd}{3} \end{bmatrix}}^{\mathbf{M} \mathbf{e}_2 }. \tag{9}

We can visualize this (Figure 33) for some intuition. However, I highly recommend Grant Sanderson’s two beautiful explanations of linear projections and the dot product for interactive visualizations.

Figure 3: (Left) The vector [1,2][ 1, 2 ] plotted in 22-dimensional space with the standard basis vectors e1=[1,0]\mathbf{e}_1 = [ 1, 0 ] and e2=[0,1]\mathbf{e}_2 = [ 0, 1 ]. (Right) The vector [1,2][ 1, 2 ] transformed by the columns of M\mathbf{M}, i.e. with the transformed standard basis vectors Me1\mathbf{Me}_1 and Me2\mathbf{Me}_2.

We can write this more generally. Consider the fact that we can represent any vector v=[v1,,vN]\mathbf{v} = [v_1, \dots, v_N] as a linear combination of the standard basis vectors e1,,eN\mathbf{e}_1, \dots, \mathbf{e}_N:

v=v1e1++vNeN.(10) \mathbf{v} = v_1 \mathbf{e}_1 + \dots + v_N \mathbf{e}_N. \tag{10}

This means we can apply our linear transformation M\mathbf{M} to get

Mv=v1(Me1)++vN(MeN).(11) \mathbf{M}\mathbf{v} = v_1 (\mathbf{M} \mathbf{e}_1) + \dots + v_N (\mathbf{M} \mathbf{e}_N). \tag{11}

So any PP-vector Mv\mathbf{M} \mathbf{v} can be represented as a linear combination of the projected standard basis vectors Me1,,MeN\mathbf{M}\mathbf{e}_1, \dots, \mathbf{M}\mathbf{e}_N.

The dot product as a projection

We can think of the dot product as a matrix-vector multiplication where the left term is a 1×N1 \times N matrix, i.e.,

ab=n=1Na1b1++aNbN=[a1aN][b1bN].(12) \mathbf{a} \cdot \mathbf{b} = \sum_{n=1}^N a_1 b_1 + \dots + a_N b_N = \begin{bmatrix} a_1 & \dots & a_N \end{bmatrix} \begin{bmatrix} b_1 \\ \vdots \\ b_N \end{bmatrix}. \tag{12}

This is why the dot product is sometimes denoted as ab\mathbf{a}^{\top} \mathbf{b} rather than as ab\mathbf{a} \cdot \mathbf{b}. So what does a 1×N1 \times N matrix represent as a transformation? It is a transformation of NN-vectors into a 11-dimensional space, i.e. a line. So the dot product between vector a\mathbf{a} and b\mathbf{b} can be thought of as projecting aRN\mathbf{a} \in \mathbb{R}^N onto a number line defined by bRN\mathbf{b} \in \mathbb{R}^N (or vice versa). Our standard basis vectors are transformed such that they lie on a number line, and any vector projected into this new space must also lie on the number line. Each column in b\mathbf{b} is the 11-dimensional analog to the standard basis vectors that we used in RN\mathbb{R}^{N}.

Let’s see a concrete example in R2\mathbb{R}^2. Let a=[3,1]\mathbf{a} = [3, 1] and b=[2,4]\mathbf{b} = [2, 4]. Then the dot product between these two vectors can be written as a matrix-vector multiplication:

[31][24]=[31][24]=[10].(13) \begin{bmatrix} 3 \\ 1 \end{bmatrix} \cdot \begin{bmatrix} 2 \\ 4 \end{bmatrix} = \begin{bmatrix} 3 & 1 \end{bmatrix} \begin{bmatrix} 2 \\ 4 \end{bmatrix} = \begin{bmatrix} 10 \end{bmatrix}. \tag{13}

If we think about the dot product as a projection from R2\mathbb{R}^2 to R\mathbb{R}, we can visualize the operation (Figure 44). The 11-vector [10][10] is represented as a linear combination of the columns of b=[3,1]\mathbf{b} = [3, 1].

Figure 4: A vector b=[2,4]\mathbf{b} = [2, 4] projected onto a number line defined by a vector a=[3,1]\mathbf{a} = [3, 1].

As a sanity check, consider that points that are evenly spaced in R2\mathbb{R}^2 and then projected onto a\mathbf{a} will still be evenly spaced in R\mathbb{R}. We can see this quantitatively. As we linearly scale either vector, we linearly scale the resultant value ab=α\mathbf{a} \cdot \mathbf{b} = \alpha:

(ka)b=n=1N(ka1)b1++(kaN)bN=k(n=1Na1b1++aNbN)=kα(14) \begin{aligned} (k \mathbf{a}) \cdot \mathbf{b} &= \sum_{n=1}^N (k a_1) b_1 + \dots + (k a_N) b_N \\ &= k \left( \sum_{n=1}^N a_1 b_1 + \dots + a_N b_N \right) \\ &= k\alpha \end{aligned} \tag{14}

While this is not a complete proof, this is the hallmark of a linear relationship.

Conclusion

How are the two definitions of the dot product related? The intuition is that both can be viewed as geometric operations. There are a lot of other good explanations of this idea online, but for me, the one that really made the idea click is realizing that the sum represents a linear projection.

   

Acknowledgments

I thank Stephen Talabac for pointing out some typos.

   

Appendix

A1. Proofs of commutative and distributive properties

The dot product is commutative because scalar multiplication is commutative:

ab=a1b1+a2b2++anbn=b1a1+b2a2++bnan=ba.(A.1) \begin{aligned} \mathbf{a} \cdot \mathbf{b} &= a_1 b_1 + a_2 b_2 + \dots + a_n b_n \\ &= b_1 a_1 + b_2 a_2 + \dots + b_n a_n \\ &= \mathbf{b} \cdot \mathbf{a}. \end{aligned} \tag{A.1}

And the dot product is distributive because scalar multiplication is distributive:

a(b+c)=a[(b1+c1),(b2+c2),,(bn+cn)]=a1(b1+c2)+a2(b2+c2)++an(bn+cn)=a1b1+a1c2+a2b2+a2c2++anbn+ancn=(a1b1+a2b2++anbn)+(a1c1+a2c2++ancn)=ab+ac.(A.2) \begin{aligned} \mathbf{a} \cdot (\mathbf{b} + \mathbf{c}) &= \mathbf{a} \cdot [ (b_1 + c_1), (b_2 + c_2), \dots, (b_n + c_n) ]^{\top} \\ &= a_1 (b_1 + c_2) + a_2 (b_2 + c_2) + \dots + a_n (b_n + c_n) \\ &= a_1 b_1 + a_1 c_2 + a_2 b_2 + a_2 c_2 + \dots + a_n b_n + a_n c_n \\ &= (a_1 b_1 + a_2 b_2 + \dots + a_n b_n) + (a_1 c_1 + a_2 c_2 + \dots + a_n c_n) \\ &= \mathbf{a} \cdot \mathbf{b} + \mathbf{a} \cdot \mathbf{c}. \end{aligned} \tag{A.2}

A2. Proof of the law of cosines

This is based on Sal Khan’s proof. The law of cosines is:

c2=a2+b22abcosθ.(A.3) \lVert\mathbf{c}\rVert^2 = \lVert\mathbf{a}\rVert^2 + \lVert \mathbf{b} \rVert^2 - 2 \lVert \mathbf{a} \rVert \lVert \mathbf{b} \rVert \cos \theta. \tag{A.3}

Consider the triangle in Figure 55.

Figure 5: An arbitary triangle.

Let’s first write ee and m\lVert \mathbf{m} \rVert in terms of cos\cos and sin\sin:

d:=bcos(θ),m:=bsin(θ),e:=cbcos(θ)d.(A.4) d := \lVert \mathbf{b} \rVert \cos(\theta), \qquad \lVert \mathbf{m} \rVert := \lVert \mathbf{b} \rVert \sin(\theta), \qquad e := \lVert \mathbf{c} \rVert - \underbrace{\lVert \mathbf{b} \rVert \cos(\theta)}_{d}. \tag{A.4}

Now let’s use the Pythagorean theorem to construct a relationship between the sides:

a2=m2+e2=(bsin(θ))2+(cbcos(θ))2=b2sin2(θ)+c22cbcos(θ)+b2cos2(θ)=b2(sin2(θ)+cos2(θ))+c22cbcos(θ)=b2+c22cbcos(θ).(A.5) \begin{aligned} \lVert \mathbf{a} \rVert^2 &= \lVert \mathbf{m} \rVert^2 + e^2 \\ &= (\lVert \mathbf{b} \rVert \sin(\theta))^2 + (\lVert \mathbf{c} \rVert - \lVert \mathbf{b} \rVert \cos(\theta))^2 \\ &= \lVert \mathbf{b} \rVert^2 \sin^2(\theta) + \lVert \mathbf{c} \rVert^2 - 2 \lVert \mathbf{c} \rVert\lVert \mathbf{b} \rVert \cos(\theta) + \lVert \mathbf{b} \rVert^2 \cos^2(\theta) \\ &= \lVert \mathbf{b} \rVert^2 (\sin^2(\theta) + \cos^2(\theta)) + \lVert \mathbf{c} \rVert^2 - 2 \lVert \mathbf{c} \rVert \lVert \mathbf{b} \rVert \cos(\theta) \\ &= \lVert \mathbf{b} \rVert^2 + \lVert \mathbf{c} \rVert^2 - 2 \lVert \mathbf{c} \rVert\lVert \mathbf{b} \rVert \cos(\theta). \end{aligned} \tag{A.5}

Note that the identity sin2(θ)+cos2(θ)=1\sin^2(\theta) + \cos^2(\theta) = 1 is just the Pythagorean theorem applied to a right triangle inscribed inside a unit circle. Therefore, we have proven the law of cosines for an arbitrary triangle.

  1. This is an old post, and I was sloppy about the distinction between column and row vectors. I don’t want to remake the figures, but I think the distinction can always be inferred from context.