Time and again, there are moments when mathematics just feels magical.
For me, one example for this is given by generating functions (and that is why they can be found on this blog).
Today, I want to talk about another such example: involutions. We will look at how they are used to prove in one sentence that primes of the form can be written as a sum of squares, in the proof of the wonderful Lindström-Gessel-Viennot lemma, and in the proof of Euler’s pentagonal number theorem.
Let us start by defining what an involution is:
An involution is simply a map on a set
such that
, i.e., applying
twice amounts to the identity.
Well-designed involutions can tell us a lot about the set , for example, when the set of fixed points
is understood.
As a first example let us look at the set of divisors of a number
. The map
is obviously an involution and it has a fixed point if and only if
is a square. This shows that the number of divisors is odd if and only if
is a square.
Zagier’s one-sentence proof
Don Zagier has used a clever involution to give a proof in one sentence of the classical result that every prime of the form is a sum of two squares.
To appreciate the nature of Zagier’s proof, let’s first look at a different way to prove this result (following Jürgen Neukirch’s Algebraic Number Theory). First, if ,
cannot be a sum of squares because squares are either
or
.
Then, if , one of the more lucid ways of seeing that it has to be a sum of two squares is by noting that
cannot remain prime in the ring
of Gaussian integers.
As a first step, we show that there is an integer such that
divides
. This can be found by Wilson’s theorem saying that a number
is prime if and only if
. Applying this to a prime of the form
, we get:
So for , we get that
and this factors as
in
.
As does not divide any of the factors, it is not a prime in
and has a decomposition
for elements
which are not units.
Now taking the norm in , i.e., the map
, we get:
The last equation being in ,
is equal to
and hence of the form
.
Now, for comparison, here is Zagier’s proof in one sentence; short enough to quote it in full:
The involution on the set
defined by
has exactly one fixed point, so
is odd and the involution defined by
also has a fixed point.
D. Zagier: A One-Sentence Proof That Every Prime Is a Sum of Two Squares, The American Mathematical Monthly, 97(2), p. 144. (fulltext)
I found this proof surprising and beautiful but also baffling. (“From the book”, really, and so thought Aigner and Ziegler.) Where does it come from? How to come up with this involution?
Following a modern reflex, I have tried to find an answer on the net and find an answer I did:
https://mathoverflow.net/questions/31113/zagiers-one-sentence-proof-of-a-theorem-of-fermat
One of the answers on the MathOverflow page shows how to understand the involution in way I would expect Euclid himself to think. Let us look at the equation like an old Greek would.
is as square, obviously. To this, we add four rectangles with side lengths
and
:
This is drawn in a way that . In this case, we can extend the
-rectangles as follows and obtain the same figure in a different way, recovering the third case of Zagier’s involution:
Relabelling gives us the first line of Zagier’s involution and also shows that these two lines are actually inverses of each other:
This leaves the second line which comes out of the following picture:
Now, you can draw this for and think about the fixed point (both graphically and by setting
in
).
Finally, let me mention that the proof by involution, beautiful as it is, is not constructive. Fortunately, the same issue of the American Mathematical Monthly also contains a wonderfully constructive way of proving the same theorem via continued fractions and hence the Euclidean algorithm:
Stan Wagon: The Euclidean Algorithm Strikes Again, The American Mathematical Monthly, 97(2), pp. 125-129
(The same MathOverflow page linked above will also tell you about more efficient ways to find the sum of two squares.)
The Lindström-Gessel-Viennot lemma
Our next example comes from combinatorics and Martin Aigner has great expositions of it, both in his book A Course in Enumeration and in his article Lattice Paths and Determinants.
We start with a combinatorial application that fits nicely to the Lindström-Gessel-Viennot lemma that can be proved beautifully using an involution. Its starting point is a triangulated hexagon of side length .
The triangulation of such a hexagon can be turned into a rhombic tiling by joining two adjacent triangles at a time. Here is an example of such a tiling:
Our motivating question is now, how many different tilings are there for a given
? As we will see below, this question can be turned into a question about paths in a graph and about determinants which can then be answered using the lemma.
As a first step towards this lemma, recall the definition of the determinant of an -matrix
:
In this, is the symmetric group on
symbols and the sign of a permutation is either +1 or -1 depending on whether it is a product of an even or an odd number of transpositions.
If we understand the matrix entries as weights in a complete bipartite graph, then the determinant gives us a weighted sum of the weights of all perfect matchings in this graph when the weight of a matching is given by the product of the weights of its edges.
Now, the Lindström-Gessel-Viennot lemma generalises this to weighted directed acyclic graphs. For a directed path in such a graph, we can define its weight
as the product of the weights of its edges. Then, if we select two sets of vertices
and
of the same size, we can define the path matrix with respect to
and
as the
matrix with entries
Here, denotes a directed path connecting nodes
and
. We will call a set of directed paths connecting each node
to a different node
a path system. Hence, a path system consists of paths
for a permutation
. This allows us to define a signum for each path system
by
.
Let be the set of all path systems with respect to
and
and
be the set of vertex disjoint path systems, i.e., those in which no pair of paths shares a common vertex.
Lemma: Let be a weighted directed acyclic graph and
and
subsets of its vertex set. Then, the determinant of the path matrix
can be computed as
We can prove the lemma by first observing that the determinant is a sum over all path systems and then define an involution that eliminates those path systems which are not vertex disjoint from the sum.
For the first step, note that a term in the sum for the determinant is of the form . The definition of the path matrix then gives us expanded terms of the form
Adding all these terms gives us the following sum over all path systems :
Now, for the second step, we will bring in the magic of an involution on . Let
be a path system. If this is a vertex disjoint path system, our involution will keep it unchanged, so that these are exactly the fixed points of the involution. If
is not vertex disjoint, choose the first path (i.e., of smallest index
)
that has a vertex in common with another path in
, and call the other such path with the smallest index
. If
is the first common vertex of these paths, then our involution will modify
by replacing
and
by paths
and
where
equals
from
until
and then follows
until its end point
. The new path
is constructed in the same way:
This is obviously an involution, its fixed points are the vertex disjoint path systems, and it is swapping two path systems in our sum that have the same weight but opposite signs because changing from to
amounts to multiplying
by a transposition. Thus, the path systems which are not vertex disjoint cancel.
Equipped with our wonderfully versatile lemma, we can now answer our motivating question about rhombic tilings.
First, we transform our tiled hexagon into a directed graph with distinguished node sets as follows:
Now the crucial step is to see that the vertex disjoint path systems in this graph are in one-to-one correspondence to the rhombic tiling. The following figure illustrates this idea:
Note how the paths follow the way triangles are joined into rhombi and thus avoid the dark rhombi.
Now, the path matrix can be computed by noting that the number of paths from to
is counted by ordinary lattice paths and hence by binomial coefficients (because lattice paths follow the same recurrence as the binomial coefficients). If the lattice positions are
and
, the path matrix has entries
, its determinant, and hence the number of tilings, is
Euler’s pentagonal number theorem
A trick that is used at least twice becomes a method. This is particularly true of the involution method when it is used for showing partition identities. A nice overview of this is given by Igor Pak in his Partition Bijections, a Survey.
Recall that a partition of a number is a sequence of positive integers
such
. Partitions can be visualised very fruitfully by Young diagrams, as in the following example for 12=5+3+3+1:
In the following, we will make good use of the set of partitions into distinct parts and the fact that it splits into the union
of partitions of even and odd length.
With the help of an involution on that is interchanging
and
, we will prove Euler’s pentagonal theorem:
Let us have a closer look at the coefficients of on each side of the equation:
On the right hand side, we have non-zero coefficients for exponents and
depending on whether
is positive or negative. These are the pentagonal numbers and hence the name of the theorem.
Let us expand some terms on the left hand side to see how it relates to partitions:
Note that the coefficients are either +1 or -1 depending on whether the partition in the exponent has an even or an odd number of terms. Also, because each exponent occurs only once in the product, the partitions are partitions into distinct parts.
Now we introduce the following set of partitions related to the pentagonal numbers:
Let be the set of all partitions corresponding to one of the diagrams above (i.e., for any value of
) and
.
Using these, proving Euler’s pentagonal theorem becomes equivalent to showing that
This can be shown to be a consequence of Franklin’s involution, an involution on that has
as fixed points and is sign reversing, i.e., interchanging
and
.
The involution is most easily understood using the following figure:
The squares in red are simply those in the last row of the Young diagram. The squares in blue are the right-most squares as long as they form a diagonal pattern. Now, if we call the number of squares in the last row and the number of squares in the other group
, Franklin’s involution works as follows:
If or
, the involution lets the Young diagram fixed. This corresponds exactly to the diagrams in
.
If , the squares along the diagonal are fewer than those in the last row and can hence be moved below the last row as a new row. In the remaining case, the involution works just the other way round, i.e., the last row now becomes a new diagonal ending at the upper right corner of the Young diagram.
Obviously, this is an involution, it leaves the diagrams in fixed and turns partitions with an odd number of parts into partitions with an even number of parts and vice versa. Thus, the difference in the number of partitions with an even or an odd number of parts is either +1 or -1 depending on whether the diagram in
has an even or an odd number of parts.
More to come…
My next post will be on famous elliptic curves and there will also be an involution in it. The following quotation should serve as a cliffhanger:
The clue to the Darboux porism is the fact that the space of congruence classes of quadrilaterals with fixed side lengths is an elliptic curve, and the foldings FC and FD act on it as involutions. — https://arxiv.org/pdf/1501.07157.pdf