For me, one example for this is given by generating functions (and that is why they can be found on this blog).

Today, I want to talk about another such example: involutions. We will look at how they are used to prove in one sentence that primes of the form can be written as a sum of squares, in the proof of the wonderful Lindström-Gessel-Viennot lemma, and in the proof of Euler’s pentagonal number theorem.

Let us start by defining what an involution is:

An involution is simply a map on a set such that , i.e., applying twice amounts to the identity.

Well-designed involutions can tell us a lot about the set , for example, when the set of fixed points is understood.

As a first example let us look at the set of divisors of a number . The map is obviously an involution and it has a fixed point if and only if is a square. This shows that the number of divisors is odd if and only if is a square.

Don Zagier has used a clever involution to give a proof in one sentence of the classical result that every prime of the form is a sum of two squares.

To appreciate the nature of Zagier’s proof, let’s first look at a different way to prove this result (following Jürgen Neukirch’s Algebraic Number Theory). First, if , cannot be a sum of squares because squares are either or .

Then, if , one of the more lucid ways of seeing that it has to be a sum of two squares is by noting that cannot remain prime in the ring of Gaussian integers.

As a first step, we show that there is an integer such that divides . This can be found by Wilson’s theorem saying that a number is prime if and only if . Applying this to a prime of the form , we get:

So for , we get that and this factors as in .

As does not divide any of the factors, it is not a prime in and has a decomposition for elements which are not units.

Now taking the norm in , i.e., the map , we get:

The last equation being in , is equal to and hence of the form .

Now, for comparison, here is Zagier’s proof in one sentence; short enough to quote it in full:

The involution on the set defined by

has exactly one fixed point, so is odd and the involution defined by also has a fixed point.

D. Zagier: **A One-Sentence Proof That Every Prime Is a Sum of Two Squares**, *The American Mathematical Monthly*, 97(2), p. 144. (fulltext)

I found this proof surprising and beautiful but also baffling. (“From the book”, really, and so thought Aigner and Ziegler.) Where does it come from? How to come up with this involution?

Following a modern reflex, I have tried to find an answer on the net and find an answer I did:

https://mathoverflow.net/questions/31113/zagiers-one-sentence-proof-of-a-theorem-of-fermat

One of the answers on the MathOverflow page shows how to understand the involution in way I would expect Euclid himself to think. Let us look at the equation like an old Greek would. is as square, obviously. To this, we add four rectangles with side lengths and :

This is drawn in a way that . In this case, we can extend the -rectangles as follows and obtain the same figure in a different way, recovering the third case of Zagier’s involution:

Relabelling gives us the first line of Zagier’s involution and also shows that these two lines are actually inverses of each other:

This leaves the second line which comes out of the following picture:

Now, you can draw this for and think about the fixed point (both graphically and by setting in ).

Finally, let me mention that the proof by involution, beautiful as it is, is not constructive. Fortunately, the same issue of the American Mathematical Monthly also contains a wonderfully constructive way of proving the same theorem via continued fractions and hence the Euclidean algorithm:

Stan Wagon: The Euclidean Algorithm Strikes Again, The American Mathematical Monthly, 97(2), pp. 125-129

(The same MathOverflow page linked above will also tell you about more efficient ways to find the sum of two squares.)

Our next example comes from combinatorics and Martin Aigner has great expositions of it, both in his book A Course in Enumeration and in his article Lattice Paths and Determinants.

We start with a combinatorial application that fits nicely to the Lindström-Gessel-Viennot lemma that can be proved beautifully using an involution. Its starting point is a triangulated hexagon of side length .

The triangulation of such a hexagon can be turned into a rhombic tiling by joining two adjacent triangles at a time. Here is an example of such a tiling:

Our motivating question is now, how many different tilings are there for a given ? As we will see below, this question can be turned into a question about paths in a graph and about determinants which can then be answered using the lemma.

As a first step towards this lemma, recall the definition of the determinant of an -matrix :

In this, is the symmetric group on symbols and the sign of a permutation is either +1 or -1 depending on whether it is a product of an even or an odd number of transpositions.

If we understand the matrix entries as weights in a complete bipartite graph, then the determinant gives us a weighted sum of the weights of all perfect matchings in this graph when the weight of a matching is given by the product of the weights of its edges.

Now, the Lindström-Gessel-Viennot lemma generalises this to weighted directed acyclic graphs. For a directed path in such a graph, we can define its weight as the product of the weights of its edges. Then, if we select two sets of vertices and of the same size, we can define the path matrix with respect to and as the matrix with entries

Here, denotes a directed path connecting nodes and . We will call a set of directed paths connecting each node to a different node a path system. Hence, a path system consists of paths for a permutation . This allows us to define a signum for each path system by .

Let be the set of all path systems with respect to and and be the set of vertex disjoint path systems, i.e., those in which no pair of paths shares a common vertex.

**Lemma:** Let be a weighted directed acyclic graph and and subsets of its vertex set. Then, the determinant of the path matrix can be computed as

We can prove the lemma by first observing that the determinant is a sum over all path systems and then define an involution that eliminates those path systems which are not vertex disjoint from the sum.

For the first step, note that a term in the sum for the determinant is of the form . The definition of the path matrix then gives us expanded terms of the form

Adding all these terms gives us the following sum over all path systems :

Now, for the second step, we will bring in the magic of an involution on . Let be a path system. If this is a vertex disjoint path system, our involution will keep it unchanged, so that these are exactly the fixed points of the involution. If is not vertex disjoint, choose the first path (i.e., of smallest index ) that has a vertex in common with another path in , and call the other such path with the smallest index . If is the first common vertex of these paths, then our involution will modify by replacing and by paths and where equals from until and then follows until its end point . The new path is constructed in the same way:

This is obviously an involution, its fixed points are the vertex disjoint path systems, and it is swapping two path systems in our sum that have the same weight but opposite signs because changing from to amounts to multiplying by a transposition. Thus, the path systems which are not vertex disjoint cancel.

Equipped with our wonderfully versatile lemma, we can now answer our motivating question about rhombic tilings.

First, we transform our tiled hexagon into a directed graph with distinguished node sets as follows:

Now the crucial step is to see that the vertex disjoint path systems in this graph are in one-to-one correspondence to the rhombic tiling. The following figure illustrates this idea:

Note how the paths follow the way triangles are joined into rhombi and thus avoid the dark rhombi.

Now, the path matrix can be computed by noting that the number of paths from to is counted by ordinary lattice paths and hence by binomial coefficients (because lattice paths follow the same recurrence as the binomial coefficients). If the lattice positions are and , the path matrix has entries

, its determinant, and hence the number of tilings, is

A trick that is used at least twice becomes a method. This is particularly true of the involution method when it is used for showing partition identities. A nice overview of this is given by Igor Pak in his Partition Bijections, a Survey.

Recall that a partition of a number is a sequence of positive integers such . Partitions can be visualised very fruitfully by Young diagrams, as in the following example for 12=5+3+3+1:

In the following, we will make good use of the set of partitions into distinct parts and the fact that it splits into the union of partitions of even and odd length.

With the help of an involution on that is interchanging and , we will prove Euler’s pentagonal theorem:

Let us have a closer look at the coefficients of on each side of the equation:

On the right hand side, we have non-zero coefficients for exponents and depending on whether is positive or negative. These are the pentagonal numbers and hence the name of the theorem.

Let us expand some terms on the left hand side to see how it relates to partitions:

Note that the coefficients are either +1 or -1 depending on whether the partition in the exponent has an even or an odd number of terms. Also, because each exponent occurs only once in the product, the partitions are partitions into distinct parts.

Now we introduce the following set of partitions related to the pentagonal numbers:

Let be the set of all partitions corresponding to one of the diagrams above (i.e., for any value of ) and .

Using these, proving Euler’s pentagonal theorem becomes equivalent to showing that

This can be shown to be a consequence of Franklin’s involution, an involution on that has as fixed points and is sign reversing, i.e., interchanging and .

The involution is most easily understood using the following figure:

The squares in red are simply those in the last row of the Young diagram. The squares in blue are the right-most squares as long as they form a diagonal pattern. Now, if we call the number of squares in the last row and the number of squares in the other group , Franklin’s involution works as follows:

If or , the involution lets the Young diagram fixed. This corresponds exactly to the diagrams in .

If , the squares along the diagonal are fewer than those in the last row and can hence be moved below the last row as a new row. In the remaining case, the involution works just the other way round, i.e., the last row now becomes a new diagonal ending at the upper right corner of the Young diagram.

Obviously, this is an involution, it leaves the diagrams in fixed and turns partitions with an odd number of parts into partitions with an even number of parts and vice versa. Thus, the difference in the number of partitions with an even or an odd number of parts is either +1 or -1 depending on whether the diagram in has an even or an odd number of parts.

My next post will be on famous elliptic curves and there will also be an involution in it. The following quotation should serve as a cliffhanger:

The clue to the Darboux porism is the fact that the space of congruence classes of quadrilaterals with fixed side lengths is an elliptic curve, and the foldings FC and FD act on it as involutions. — https://arxiv.org/pdf/1501.07157.pdf

]]>

Generating functions are one of the most magical objects in mathematics. Let us look at a first example in only one variable: the Fibonacci sequence. (This example and much more can be found in the book generatingfunctionology by Herbert Wilf.)

The Fibonacci sequence is defined recursively as:

for .

Can we find a closed form solution for ?

One way to do so is by setting up the generating function . This is a formal sum, the mathematical way of hanging the Fibonacci numbers on a long clothes line and labelling the -th peg by :

In order to find , we need to find the coefficient of in . Nothing won? Well, let’s see how far we can get by manipulating .

Our original recursion tells us that . (We get this by multiplying each side by and then summing over .)

We can express this in terms of as follows (write out the sums to see this):

We can solve this for to get .

With the help of partial fraction decomposition, using the factorisation of the denominator, and using the abbreviation , we find that

.

Developing the right hand side into geometric series, we get:

From this we can read off the -th Fibonacci number as the coefficient of in the series:

Generating functions will not always give us closed form solutions like this but they can often give us very good asymptotics for combinatorial problems. Let’s go one step further and look at how a generating function in several variables can give us such information.

The combinatorial problem we will consider is that of computing asymptotics for the so-called Delannoy numbers. The Delannoy number gives the number of paths in the grid starting at position , ending at position , and taking steps of the form , , or .

If we set up the generating function

for this, we can use a similar trick as for the Fibonacci numbers to find a rational function form for this. For all pairs , we have the recurrence

where each term comes from one of the three possible steps in a path. Summing this over all , we get

.

Now putting in again and cleaning up any terms of the form and gives us all we need to find the expression we are looking for:

.

Hence, using the fact that , this yields

.

As mentioned above, we usually cannot expect to find a closed-form solution. However, there is a very useful tool from complex analysis for finding asymptotics: Cauchy’s integral formula. If our generating function is given by a function as above, we can recover the coefficients by contour integration:

(Here, , , and are vectors of dimension corresponding to the number of variables in our generating function, is a product of sufficiently small circles around the origin of each coordinate axis.)

In their paper Twenty combinatorial examples of asymptotics derived from multivariate generating functions, Pemantle and Wilson give a general recipe on how to evaluate this integral in the multivariate case if is rational of the form for polynomials and to find asymptotics for where the direction is suitably controlled. A central role is played by the geometry imposed by the zeros of . This is how the recipe is summarised (very closely following the text by Pemantle and Wilson):

- To find asymptotics in the direction , look at the geometry of the variety near a certain set of critical points.
- The set of critical points can be reduced to a subset of so-called contributing critical points which can be determined by a combination of algebraic and geometric criteria.
- The critical points come in three flavours: smooth, multiple, and bad.
- For each smooth or critical point there is an expansion for in terms of the derivatives of and .
- No general such expansion is known for bad points.

This should be vague enough to make you look up the details in the paper or the book.

Let me close by stating the result for the Delannoy numbers:

(I particularly like the fact that a Gröbner basis is used for handling in the derivation of this result.)

]]>The Eurasian bittern is a threatened bird species found in reed beds and other wetland habitats. Its sound is very simple, very deep, and very characteristic. It is pretty similar to what you can produce by blowing over the opening of an empty bottle (use a large one) in regular intervals.

Recently, I have been asked whether I still have the Matlab code for the algorithm. A bit of digging on an external hard drive has shown that I indeed still have it.

Without further ado, I would like to share it (public domain):

If you find this useful for a publication, please consider referencing:

R Bardeli, D Wolff, F Kurth, M Koch, KH Tauchert, KH Frommolt (2010):

*Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring*

Pattern Recognition Letters 31 (12), 1524-1534

Caustics are produced when light is reflected or refracted by curved surfaces. They are visible as lines of high intensity as can be seen in the reflections of light by a wedding ring (see photo above) or a coffee cup. If you have ever incinerated something (say paper) by using a magnifying glass, you will have a perfect basis for understanding the Greek origin of the name caustic (καυστός: burnt).

In geometrical optics, caustics can be understood as the envelope of the light rays reflected by a surface. An envelope of a family of curves (the reflected light rays in our case) is a curve which is tangent in at least one point to each curve in the family. Here is an illustration of this for the caustic of a circle:

When the reflecting curve is algebraic (described as the zero set of a polynomial) this leads to an algebraic curve as the caustic.

Let’s have a quick look at how to find this curve in the example above. We set up our scene in the plane such that the light source is at the point (0,0) and the ring reflecting the light is a circle of radius one centred at (2, 0). The points on this circle satisfy the equation . To find the equation of the line of reflected light through the point , we select a second point on that line and set up the two-point form of the equation of a line:

This gives an equation for all points on that line. One way to find a point on the line representing the reflected light is to choose a point on the light ray through . Let’s choose the point . Then we subtract twice the projection of the line from to onto the normal of the circle at . The normal is given by . Thus, the second point on the reflected ray can be found as .

This is enough information to set up the equations for the caustic. First, the points have to be on the circle reflecting the light rays:

(1)

Second, we get an equation describing the reflected ray with respect to a point by using the two-point form given above:

(2)

Third, we need to express the fact that the caustic is tangent to each curve in the family in at least one point. I am omitting some details here and just tell you that the resulting equation is given by:

(3)

The indices indicate partial derivatives. This gives us our last polynomial equation:

Now that we have found a system of algebraic equations, we can find the equation of our caustic by eliminating the variables and from the system. I have a lot of respect for the old masters who have done this by hand but for this blog post, I will resort to the help of a computer algebra system. Here, I have asked SageMath to do the job for me:

This gives us the sixth degree equation

and finally, we can plot the result together with our original picture:

If this example has whet your appetite for computational algebraic geometry and you are not already initiated, have a look at the wonderful book Ideals, Varieties, and Algorithms.

There is an interesting, if controversial, theory of dark matter predicting it to form ring caustics in the plane of the Milky Way. In the paper Testing the Dark Matter Caustic Theory Against Observations in the Milky Way, the authors test this theory against actual observations.

]]>

Let me close with a second figure inspired by the Geometric Trilogy books. It shows two different ways of (literally) seeing that the sum of the first odd numbers equals :

]]>JM Landsberg has recently written a wonderful introduction to geometric complexity theory which is how the corresponding research field is called. This has inspired me to borrow some of it and write about the permanent and the determinant of a matrix.

In linear algebra, the determinant is a value associated to each square matrix. It has many particularly nice properties and interpretations. For example, it can help to decide whether a system of linear equations represented by the matrix has a unique solution and it computes the volume of the parallelepiped spanned by its rows. For an matrix with entries it can be defined by the following formula:

The sum is taken over all permutations of the set . denotes the symmetric group which comprises all these permutations and the sign of a permutation is +1 or -1 depending on whether the permutation has an even or odd number of transpositions.

At a first glance, the permanent looks like a simplification of the determinant because it is given by essentially the same formula without the sign function:

The permanent is very useful because it can be employed to find the answer to counting problems like finding the number of perfect matchings in a bipartite graph. However, where the determinant can be computed in polynomial time, e.g. by Gaussian elimination, computing the permanent is #P-hard.

Now, in the case of two by two matrices, the determinant can be used to compute the permanent:

For larger matrices, however, computing the permanent by a determinant of the same size whose entries are given by linear forms of the entries of the original matrix is not possible. Then how about using the determinant of a matrix of a larger size? We would like to find a way to do this using determinants which are not too much larger. But the hardness result on computing the permanent means that using determinants with a size polynomial in the size of the matrix is winning the battle. A much more humble question is to find a representation using exponential size.

Bruno Grenet shows how to do this with an determinant in his paper An Upper Bound for the Permanent Versus Determinant Problem. The solution is sufficiently simply and I will therefore conclude by explaining how to do it using the example of three by three matrices. Another question is why this is correct and you can get the answer by going through the four pages of the paper.

Let’s start with a three by three matrix:

In order to find a determinant representation for this permanent, we construct a graph from the subsets of the set (we would use the set for the general case).

This graph is constructed in a way such that the vertices (the subsets) are grouped by size such that all sets of the same cardinality form one layer of the graph. I will refer to the layer containing sets of cardinality as the -th layer. Now, edges are introduced between two vertices when inserting a single number into one of the respective sets yields the other. These edges are oriented from the smaller to the larger set and are labelled by if it ends in the -th layer and the larger set is obtained by inserting the number into the smaller set. For example, there is an edge labelled from the node to the node because the latter set is in layer 2 and was obtained by inserting 3. For each vertex representing a subset which is neither empty nor the whole set, we add an edge from this vertex to itself labelled by the number one. Finally, we identify the vertices representing the empty set and the whole set into one joint vertex (vertices 1 and 8 in the figure).

We can now order the vertices (for example as indicated by the red numbers in the figure) and represent the graph by its adjacency matrix:

This matrix has entry zero at position if there is no edge between the -th and the -th vertex in our ordering. If there is an edge then its label is given at position .

Now we can compute the determinant of this matrix and find that it is exactly the permanent of our matrix. You can use WolframAlpha to show that its determinant is indeed equal to the permanent. Why this is true is somewhat harder to see. It has to do with counting the number of cycle covers of the above graph. I invite you to have a look at the paper for more information.

]]>We start out with Euler’s well-known polyhedron formula: If , , and are the numbers of vertices, edges, and faces of a polyhedron then always equals two. The sum of angles between edges meeting at a given vertex cannot exceed and, therefore, the number of edges meeting at a vertex is always equal to three. This gives us the following relationship between the number of vertices and the number of angles:

Now, let’s look for a relationship between the number of faces and the number of edges. We’ll call the number of pentagons and the number of hexagons . This gives us for the total number of faces. Each pentagon is bounded by five edges and each hexagon is bounded by six edges. If we add these numbers for all faces, we count every edge twice (exactly two faces meet in an edge). Hence, we get the following relationship between the number of faces and the number of edges:

This is all the information we need. We can put it together as follows:

(six times Euler’s polyhedron formula)

(using that and )

(using that )

Now, whenever you see a polyhedron made from pentagons and hexagons alone, you can say immediately that it has 12 pentagons. Why? Because it is built into the fabric of reality and mathematics allows us a glimpse at it.

You can find slightly different presentations of this in other places on the web (like here and here) but also in wonderful books like *Shaping Space* (I’ve advertised this before).

There were several occasions in the history of mathematics when mathematicians had to answer to the question: how hard is your maths?

« Oh, it looks pretty hard but just how hard is it? »

Here are three examples of stress tests that have led to foundational crises:

- Pouring the thermite of Russell’s paradox on it: the set of all sets which are not members of themselves, does it contain itself?
- Grinding it with Gödel’s first incompleteness theorem: a logical theory sufficiently expressive to describe the arithmetic of the natural numbers is either inconsistent or incomplete.
- Throwing the continuum on it: the question whether there is a set whose cardinality is strictly between that of the natural numbers and that of the real numbers is independent of Zermelo–Fraenkel set theory.

Intrigued? Want some more? Go get a quick overview at Wikipedia’s page on the foundations of mathematics.

]]>