The Power of Involutions

zagier2.pngTime and again, there are moments when mathematics just feels magical.
For me, one example for this is given by generating functions (and that is why they can be found on this blog).
Today, I want to talk about another such example: involutions. We will look at how they are used to prove in one sentence that primes of the form p=4n+1 can be written as a sum of squares, in the proof of the wonderful Lindström-Gessel-Viennot lemma, and in the proof of Euler’s pentagonal number theorem.

Let us start by defining what an involution is:
An involution is simply a map f : S \rightarrow S on a set S such that f \circ f = \mathrm{id}_S, i.e., applying f twice amounts to the identity.

Well-designed involutions can tell us a lot about the set S, for example, when the set of fixed points \{s \in S \mid f(s) = s\} is understood.

As a first example let us look at the set D_n := \{d \in \mathbb{N} \mid d | n\} of divisors of a number n. The map d \mapsto n/d is obviously an involution and it has a fixed point if and only if n is a square. This shows that the number of divisors is odd if and only if n is a square.

Zagier’s one-sentence proof

Don Zagier has used a clever involution to give a proof in one sentence of the classical result that every prime of the form p=4n+1 is a sum of two squares.

To appreciate the nature of Zagier’s proof, let’s first look at a different way to prove this result (following Jürgen Neukirch’s Algebraic Number Theory). First, if p \equiv 3\mod 4, p cannot be a sum of squares because squares are either \equiv 0\mod 4 or \equiv 1\mod 4.

Then, if p \equiv 1\mod 4, one of the more lucid ways of seeing that it has to be a sum of two squares is by noting that p cannot remain prime in the ring \mathbb{Z}[i] of Gaussian integers.

As a first step, we show that there is an integer x such that p divides x^2+1. This can be found by Wilson’s theorem saying that a number p is prime if and only if -1 \equiv (p-1)! \mod p. Applying this to a prime of the form p = 4n+1, we get:

-1 \equiv (p-1)! \mod p\\= (1\cdot 2 \cdot \ldots \cdot (2n)) \cdot (\underbrace{(p-1)}_{4n}(p-2)\cdot\ldots\cdot \underbrace{(p-2n)}_{2n+1})\\\equiv (2n)!(-1)^{2n}(2n)!\\=((2n)!)^2\mod p

So for x := (2n)!, we get that p | x^2 + 1 and this factors as (x+i)(x-i) in \mathbb{Z}[i].

As p does not divide any of the factors, it is not a prime in \mathbb{Z}[i] and has a decomposition p = \alpha\beta for elements \alpha, \beta \in \mathbb{Z}[i] which are not units.

Now taking the norm in \mathbb{Z}[i], i.e., the map N: \mathbb{Z}[i] \ni a + bi \mapsto a^2 + b^2 \in \mathbb{Z}, we get:

p^2 =  N(\alpha)N(\beta)

The last equation being in \mathbb{Z}, p is equal to N(\alpha) and hence of the form a^2 + b^2.

Now, for comparison, here is Zagier’s proof in one sentence; short enough to quote it in full:

The involution on the set S = \{(x,y,z)\in\mathbb{N}^3 \mid x^2 + 4yz = p\} defined by

(x,y,z) \mapsto \begin{cases} (x+2z,z,y-x-z) & \text{if } x<y-z\\(2y-x,y,x-y+z)& \text{if } y-z<x<2y\\(x-2y,x-y+z,y)& \text{if } x>2y\end{cases}

has exactly one fixed point, so |S| is odd and the involution defined by (x,y,z) \mapsto (x,z,y) also has a fixed point.

D. Zagier: A One-Sentence Proof That Every Prime p \equiv 1 (\mod 4) Is a Sum of Two Squares, The American Mathematical Monthly, 97(2), p. 144. (fulltext)

I found this proof surprising and beautiful but also baffling. (“From the book”, really, and so thought Aigner and Ziegler.) Where does it come from? How to come up with this involution?

Following a modern reflex, I have tried to find an answer on the net and find an answer I did:

https://mathoverflow.net/questions/31113/zagiers-one-sentence-proof-of-a-theorem-of-fermat

One of the answers on the MathOverflow page shows how to understand the involution in way I would expect Euclid himself to think. Let us look at the equation x^2+4yz = p like an old Greek would. x^2 is as square, obviously. To this, we add four rectangles with side lengths y and z:

zagier1.png

This is drawn in a way that x > 2y. In this case, we can extend the yz-rectangles as follows and obtain the same figure in a different way, recovering the third case of Zagier’s involution:

zagier2(x,y,z) \mapsto (x-2y,x-y+z,y)

Relabelling gives us the first line of Zagier’s involution and also shows that these two lines are actually inverses of each other:

zagier3.png

This leaves the second line which comes out of the following picture:

zagier4.png

Now, you can draw this for x=y and think about the fixed point (both graphically and by setting x=y in x^2 + 4yz = p = 4n+1).

Finally, let me mention that the proof by involution, beautiful as it is, is not constructive. Fortunately, the same issue of the American Mathematical Monthly also contains a wonderfully constructive way of proving the same theorem via continued fractions and hence the Euclidean algorithm:
Stan Wagon: The Euclidean Algorithm Strikes AgainThe American Mathematical Monthly, 97(2), pp. 125-129

(The same MathOverflow page linked above will also tell you about more efficient ways to find the sum of two squares.)

The Lindström-Gessel-Viennot lemma

Our next example comes from combinatorics and Martin Aigner has great expositions of it, both in his book A Course in Enumeration and in his article Lattice Paths and Determinants.

We start with a combinatorial application that fits nicely to the Lindström-Gessel-Viennot lemma that can be proved beautifully using an involution. Its starting point is a triangulated hexagon of side length n.

gessel1.png

The triangulation of such a hexagon can be turned into a rhombic tiling by joining two adjacent triangles at a time. Here is an example of such a tiling:

gessel2.png

Our motivating question is now, how many different tilings h(n) are there for a given n? As we will see below, this question can be turned into a question about paths in a graph and about determinants which can then be answered using the lemma.

As a first step towards this lemma, recall the definition of the determinant of an n\times n-matrix M:

\mathrm{det}(M) = \sum_{\sigma \in S_n} \mathrm{sign} \sigma \prod_{i=1}^n m_{i\sigma(i)}

In this, S_n is the symmetric group on n symbols and the sign of a permutation is either +1 or -1 depending on whether it is a product of an even or an odd number of transpositions.

If we understand the matrix entries m_{ij} as weights in a complete bipartite graph, then the determinant gives us a weighted sum of the weights of all perfect matchings in this graph when the weight of a matching is given by the product of the weights of its edges.

Now, the Lindström-Gessel-Viennot lemma generalises this to weighted directed acyclic graphs. For a directed path p in such a graph, we can define its weight w(P) as the product of the weights of its edges. Then, if we select two sets of vertices A = \{A_1, \ldots, A_n\} and B = \{B_1, \ldots, B_n\} of the same size, we can define the path matrix with respect to A and B as the n\times n matrix with entries

m_{ij} = \sum_{p:A_i\rightarrow B_j} w(p).

Here, p:A_i\rightarrow B_j denotes a directed path connecting nodes A_i and B_j. We will call a set of directed paths connecting each node A_i \in A to a different node B_j \in B a path system. Hence, a path system consists of paths p:A_i\rightarrow B_{\sigma(i)} for a permutation \sigma \in S_n. This allows us to define a signum for each path system P by \mathrm{sign} P := \mathrm{sign} \sigma.

Let \mathcal{P} be the set of all path systems with respect to A and B and \mathcal{P}' be the set of vertex disjoint path systems, i.e., those in which no pair of paths shares a common vertex.

Lemma: Let G be a weighted directed acyclic graph and A = \{A_1, \ldots, A_n\} and B = \{B_1, \ldots, B_n\} subsets of its vertex set. Then, the determinant of the path matrix M can be computed as

\mathrm{det} M = \sum_{P\in \mathcal{P}'} \mathrm{sign} P w(P).

We can prove the lemma by first observing that the determinant is a sum over all path systems and then define an involution that eliminates those path systems which are not vertex disjoint from the sum.

For the first step, note that a term in the sum for the determinant is of the form \mathrm{sign}\sigma\prod_{i=1}^nm_{i\sigma(i)}. The definition of the path matrix then gives us expanded terms of the form

\mathrm{sign}\sigma\prod_{i=1}^n\sum_{p:A_i\rightarrow B_{\sigma(i)}} w(p)

Adding all these terms gives us the following sum over all path systems \mathcal{P}:

\mathrm{det}M = \sum_{P \in \mathcal{P}} \mathrm{sign}P w(P)

Now, for the second step, we will bring in the magic of an involution on \mathcal{P}. Let P \in \mathcal{P} be a path system. If this is a vertex disjoint path system, our involution will keep it unchanged, so that these are exactly the fixed points of the involution. If P is not vertex disjoint, choose the first path (i.e., of smallest index i) p_i : A_i \rightarrow B_{\sigma(i)} that has a vertex in common with another path in P, and call the other such path with the smallest index p_j. If X is the first common vertex of these paths, then our involution will modify P by replacing p_i and p_j by paths p_i' and p_j' where p_i' equals p_i from A_i until X and then follows p_j until its end point B_{\sigma(j)}. The new path p_j' is constructed in the same way:

gessel3.png

This is obviously an involution, its fixed points are the vertex disjoint path systems, and it is swapping two path systems in our sum that have the same weight but opposite signs because changing from A_i\rightarrow B_{\sigma(i)} to A_i\rightarrow B_{\sigma(j)} amounts to multiplying \sigma by a transposition. Thus, the path systems which are not vertex disjoint cancel.

Equipped with our wonderfully versatile lemma, we can now answer our motivating question about rhombic tilings.

First, we transform our tiled hexagon into a directed graph with distinguished node sets as follows:

gessel4.png

Now the crucial step is to see that the vertex disjoint path systems in this graph are in one-to-one correspondence to the rhombic tiling. The following figure illustrates this idea:

gessel5.png

Note how the paths follow the way triangles are joined into rhombi and thus avoid the dark rhombi.

Now, the path matrix can be computed by noting that the number of paths from A_i to B_j is counted by ordinary lattice paths and hence by binomial coefficients (because lattice paths follow the same recurrence as the binomial coefficients). If the lattice positions are A_i = (i,i) and B_j = (n-j, n+j), the path matrix has entries

m_{ij} = \binom{n-j+i+n+j-i}{n+i-j}=\binom{2n}{n+i-j}, its determinant, and hence the number of tilings, is

h(n) = \mathrm{det}\left(\binom{2n}{n+i-j}\right)_{i,j=0}^{n-1}=\prod_{i=0}^{n-1}\frac{\binom{2n+i}{n}}{\binom{n+i}{n}}.

Euler’s pentagonal number theorem

A trick that is used at least twice becomes a method. This is particularly true of the involution method when it is used for showing partition identities. A nice overview of this is given by Igor Pak in his Partition Bijections, a Survey.

Recall that a partition of a number n is a sequence of positive integers \lambda_i such n = \sum_i \lambda_i. Partitions can be visualised very fruitfully by Young diagrams, as in the following example for 12=5+3+3+1:

euler1.png

In the following, we will make good use of the set D_n of partitions into distinct parts and the fact that it splits into the union D_n = D_n^+ \cup D_n^- of partitions of even and odd length.

With the help of an involution on D_n that is interchanging D_n^+ and D_n^-, we will prove Euler’s pentagonal theorem:

\displaystyle\prod_{i=1}^\infty(1-t^i) = \sum_{m=-\infty}^{+\infty}(-1)^{m-1}t^{\frac{m(3m-1)}{2}}

Let us have a closer look at the coefficients of t^n on each side of the equation:

On the right hand side, we have non-zero coefficients for exponents \frac{m(3m-1)}{2} and \frac{m(3m+1)}{2} depending on whether m is positive or negative. These are the pentagonal numbers and hence the name of the theorem.

Let us expand  some terms on the left hand side to see how it relates to partitions:

(1-t^1)\cdot (1-t^2)\cdot (1-t^3)\cdot (1-t^4)\cdot (1-t^5)

= \underbrace{-t^{1+2+3+4+5}}_{15} \underbrace{+t^{2+3+4+5}}_{14} \underbrace{+t^{1+3+4+5}}_{13} \underbrace{- t^{3+4+5} + t^{1+2+4+5}}_{12} \underbrace{- t^{2+4+5} + t^{1+2+3+5}}_{11}

\underbrace{- t^{2+3+5} + t^{1+2+3+4} - t^{1+4+5}}_{10} \underbrace{- t^{2+3+4} + t^{4+5} - t^{1+3+5}}_{9} \underbrace{- t^{1+2+5} + t^{3+5} - t^{1+3+4}}_{8} \underbrace{+ t^{3+4} - t^{1+2+4} + t^{2+5}}_{7} \underbrace{+ t^{1+5} - t^{1+2+3} + t^{2+4}}_{6}

\underbrace{+ t^{2+3} - t^5 + t^{1+4}}_{5} \underbrace{-t^4 + t^{1+3}}_{4} \underbrace{- t^3 + t^{1+2}}_3 - t^2 - t^1 + 1

Note that the coefficients are either +1 or -1 depending on whether the partition in the exponent has an even or an odd number of terms. Also, because each exponent occurs only once in the product, the partitions are partitions into distinct parts.

Now we introduce the following set of partitions related to the pentagonal numbers:

euler3.png

Let F be the set of all partitions corresponding to one of the diagrams above (i.e., for any value of m) and F_n := F \cap D_n.
Using these, proving Euler’s pentagonal theorem becomes equivalent to showing that

|D_n^+ - D_n^-| = \pm |F_n|.

This can be shown to be a consequence of Franklin’s involution, an involution on D_n that has F_n as fixed points and is sign reversing, i.e., interchanging D_n^+ and D_n^-.

The involution is most easily understood using the following figure:

euler2.png

The squares in red are simply those in the last row of the Young diagram. The squares in blue are the right-most squares as long as they form a diagonal pattern. Now, if we call the number of squares in the last row s and the number of squares in the other group g, Franklin’s involution works as follows:

If s=g or s=g+1, the involution lets the Young diagram fixed. This corresponds exactly to the diagrams in F_n.

If g>s, the squares along the diagonal are fewer than those in the last row and can hence be moved below the last row as a new row. In the remaining case, the involution works just the other way round, i.e., the last row now becomes a new diagonal ending at the upper right corner of the Young diagram.

Obviously, this is an involution, it leaves the diagrams in F_n fixed and turns partitions with an odd number of parts into partitions with an even number of parts and vice versa. Thus, the difference in the number of partitions with an even or an odd number of parts is either +1 or -1 depending on whether the diagram in F_n has an even or an odd number of parts.

More to come…

My next post will be on famous elliptic curves and there will also be an involution in it. The following quotation should serve as a cliffhanger:

The clue to the Darboux porism is the fact that the space of congruence classes of quadrilaterals with fixed side lengths is an elliptic curve, and the foldings FC and FD act on it as involutions. — https://arxiv.org/pdf/1501.07157.pdf

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s