Time and again, there are moments when mathematics just feels magical.
For me, one example for this is given by generating functions (and that is why they can be found on this blog).
Today, I want to talk about another such example: involutions. We will look at how they are used to prove in one sentence that primes of the form can be written as a sum of squares, in the proof of the wonderful Lindström-Gessel-Viennot lemma, and in the proof of Euler’s pentagonal number theorem.
Let us start by defining what an involution is:
An involution is simply a map on a set such that , i.e., applying twice amounts to the identity.
Well-designed involutions can tell us a lot about the set , for example, when the set of fixed points is understood.
As a first example let us look at the set of divisors of a number . The map is obviously an involution and it has a fixed point if and only if is a square. This shows that the number of divisors is odd if and only if is a square.
Zagier’s one-sentence proof
To appreciate the nature of Zagier’s proof, let’s first look at a different way to prove this result (following Jürgen Neukirch’s Algebraic Number Theory). First, if , cannot be a sum of squares because squares are either or .
Then, if , one of the more lucid ways of seeing that it has to be a sum of two squares is by noting that cannot remain prime in the ring of Gaussian integers.
As a first step, we show that there is an integer such that divides . This can be found by Wilson’s theorem saying that a number is prime if and only if . Applying this to a prime of the form , we get:
So for , we get that and this factors as in .
As does not divide any of the factors, it is not a prime in and has a decomposition for elements which are not units.
Now taking the norm in , i.e., the map , we get:
The last equation being in , is equal to and hence of the form .
Now, for comparison, here is Zagier’s proof in one sentence; short enough to quote it in full:
The involution on the set defined by
has exactly one fixed point, so is odd and the involution defined by also has a fixed point.
D. Zagier: A One-Sentence Proof That Every Prime Is a Sum of Two Squares, The American Mathematical Monthly, 97(2), p. 144. (fulltext)
I found this proof surprising and beautiful but also baffling. (“From the book”, really, and so thought Aigner and Ziegler.) Where does it come from? How to come up with this involution?
Following a modern reflex, I have tried to find an answer on the net and find an answer I did:
One of the answers on the MathOverflow page shows how to understand the involution in way I would expect Euclid himself to think. Let us look at the equation like an old Greek would. is as square, obviously. To this, we add four rectangles with side lengths and :
This is drawn in a way that . In this case, we can extend the -rectangles as follows and obtain the same figure in a different way, recovering the third case of Zagier’s involution:
Relabelling gives us the first line of Zagier’s involution and also shows that these two lines are actually inverses of each other:
This leaves the second line which comes out of the following picture:
Now, you can draw this for and think about the fixed point (both graphically and by setting in ).
Finally, let me mention that the proof by involution, beautiful as it is, is not constructive. Fortunately, the same issue of the American Mathematical Monthly also contains a wonderfully constructive way of proving the same theorem via continued fractions and hence the Euclidean algorithm:
Stan Wagon: The Euclidean Algorithm Strikes Again, The American Mathematical Monthly, 97(2), pp. 125-129
(The same MathOverflow page linked above will also tell you about more efficient ways to find the sum of two squares.)
The Lindström-Gessel-Viennot lemma
We start with a combinatorial application that fits nicely to the Lindström-Gessel-Viennot lemma that can be proved beautifully using an involution. Its starting point is a triangulated hexagon of side length .
The triangulation of such a hexagon can be turned into a rhombic tiling by joining two adjacent triangles at a time. Here is an example of such a tiling:
Our motivating question is now, how many different tilings are there for a given ? As we will see below, this question can be turned into a question about paths in a graph and about determinants which can then be answered using the lemma.
As a first step towards this lemma, recall the definition of the determinant of an -matrix :
In this, is the symmetric group on symbols and the sign of a permutation is either +1 or -1 depending on whether it is a product of an even or an odd number of transpositions.
If we understand the matrix entries as weights in a complete bipartite graph, then the determinant gives us a weighted sum of the weights of all perfect matchings in this graph when the weight of a matching is given by the product of the weights of its edges.
Now, the Lindström-Gessel-Viennot lemma generalises this to weighted directed acyclic graphs. For a directed path in such a graph, we can define its weight as the product of the weights of its edges. Then, if we select two sets of vertices and of the same size, we can define the path matrix with respect to and as the matrix with entries
Here, denotes a directed path connecting nodes and . We will call a set of directed paths connecting each node to a different node a path system. Hence, a path system consists of paths for a permutation . This allows us to define a signum for each path system by .
Let be the set of all path systems with respect to and and be the set of vertex disjoint path systems, i.e., those in which no pair of paths shares a common vertex.
Lemma: Let be a weighted directed acyclic graph and and subsets of its vertex set. Then, the determinant of the path matrix can be computed as
We can prove the lemma by first observing that the determinant is a sum over all path systems and then define an involution that eliminates those path systems which are not vertex disjoint from the sum.
For the first step, note that a term in the sum for the determinant is of the form . The definition of the path matrix then gives us expanded terms of the form
Adding all these terms gives us the following sum over all path systems :
Now, for the second step, we will bring in the magic of an involution on . Let be a path system. If this is a vertex disjoint path system, our involution will keep it unchanged, so that these are exactly the fixed points of the involution. If is not vertex disjoint, choose the first path (i.e., of smallest index ) that has a vertex in common with another path in , and call the other such path with the smallest index . If is the first common vertex of these paths, then our involution will modify by replacing and by paths and where equals from until and then follows until its end point . The new path is constructed in the same way:
This is obviously an involution, its fixed points are the vertex disjoint path systems, and it is swapping two path systems in our sum that have the same weight but opposite signs because changing from to amounts to multiplying by a transposition. Thus, the path systems which are not vertex disjoint cancel.
Equipped with our wonderfully versatile lemma, we can now answer our motivating question about rhombic tilings.
First, we transform our tiled hexagon into a directed graph with distinguished node sets as follows:
Now the crucial step is to see that the vertex disjoint path systems in this graph are in one-to-one correspondence to the rhombic tiling. The following figure illustrates this idea:
Note how the paths follow the way triangles are joined into rhombi and thus avoid the dark rhombi.
Now, the path matrix can be computed by noting that the number of paths from to is counted by ordinary lattice paths and hence by binomial coefficients (because lattice paths follow the same recurrence as the binomial coefficients). If the lattice positions are and , the path matrix has entries
, its determinant, and hence the number of tilings, is
Euler’s pentagonal number theorem
A trick that is used at least twice becomes a method. This is particularly true of the involution method when it is used for showing partition identities. A nice overview of this is given by Igor Pak in his Partition Bijections, a Survey.
In the following, we will make good use of the set of partitions into distinct parts and the fact that it splits into the union of partitions of even and odd length.
With the help of an involution on that is interchanging and , we will prove Euler’s pentagonal theorem:
Let us have a closer look at the coefficients of on each side of the equation:
On the right hand side, we have non-zero coefficients for exponents and depending on whether is positive or negative. These are the pentagonal numbers and hence the name of the theorem.
Let us expand some terms on the left hand side to see how it relates to partitions:
Note that the coefficients are either +1 or -1 depending on whether the partition in the exponent has an even or an odd number of terms. Also, because each exponent occurs only once in the product, the partitions are partitions into distinct parts.
Now we introduce the following set of partitions related to the pentagonal numbers:
Let be the set of all partitions corresponding to one of the diagrams above (i.e., for any value of ) and .
Using these, proving Euler’s pentagonal theorem becomes equivalent to showing that
This can be shown to be a consequence of Franklin’s involution, an involution on that has as fixed points and is sign reversing, i.e., interchanging and .
The involution is most easily understood using the following figure:
The squares in red are simply those in the last row of the Young diagram. The squares in blue are the right-most squares as long as they form a diagonal pattern. Now, if we call the number of squares in the last row and the number of squares in the other group , Franklin’s involution works as follows:
If or , the involution lets the Young diagram fixed. This corresponds exactly to the diagrams in .
If , the squares along the diagonal are fewer than those in the last row and can hence be moved below the last row as a new row. In the remaining case, the involution works just the other way round, i.e., the last row now becomes a new diagonal ending at the upper right corner of the Young diagram.
Obviously, this is an involution, it leaves the diagrams in fixed and turns partitions with an odd number of parts into partitions with an even number of parts and vice versa. Thus, the difference in the number of partitions with an even or an odd number of parts is either +1 or -1 depending on whether the diagram in has an even or an odd number of parts.
More to come…
My next post will be on famous elliptic curves and there will also be an involution in it. The following quotation should serve as a cliffhanger:
The clue to the Darboux porism is the fact that the space of congruence classes of quadrilaterals with fixed side lengths is an elliptic curve, and the foldings FC and FD act on it as involutions. — https://arxiv.org/pdf/1501.07157.pdf