Mar 27, 2020

Flags Ranked by Matrix Rank

One day I was watching a lecture, and the lecturer noted that the flags of some nations, when viewed as matrices, have quite low rank, in the sense of the standard concept from linear algebra. I found this remark pretty amusing, so I set out to compute the ranks of some flags. The flag of Bolivia has rank one, since all of its columns are the same and hence any two columns are linearly dependent. The flag of Sweden has rank two; there are two types of columns: the all yellow ones and the ones with a blue top, yellow middle, and blue bottom, and these two types of columns are linearly independent. The rank of some flags cannot be computed by hand, so I collected some images of flags and wrote some scripts to compute them.

Flag Images

Luckily for me, there was some nice data available that worked perfectly for this task. Wikimedia hosts svg files with the flags of all sorts of territories and states under a consistent format. What I did at first was directly scrape the flags of sovereign states from some Wikipedia lists. After running some computations on these, I came across this repository of region flags by Google, which has the Wikipedia flags including some US state and Canadian province flags that I did not have. Thus, I started using these svg files instead. In total, we have 326 flags (with some duplicates due to, for example, Bouvet Island using the flag of Norway). I converted each svg to a png file to allow for the analysis of each flag as a matrix of pixels. Technically, the flag pngs are not quite matrices of pixels, as each pixel has a red, green, and blue value. We simply map these images to grayscale to analyze them—as far as I can tell, there are no overlaps in which two colors map to the same grayscale value.

Ranks and Complexity of Flags

There is no doubt that the flags of low rank are simple and easy to unambiguously define. Any \(m \times n\) real matrix of rank \(k\) can be written as a sum of outer products \(\sum_{j=1}^k x_j y_j^T\), where \(x_j \in \mathbb{R}^m\) and \(y_j \in \mathbb{R}^n\), so \(k(m+n)\) real numbers specify the whole matrix. In fact, all that one has to do to describe the flags of Bolivia and Sweden is to specify a few colors and lines. However, the inverse is certainly not true—not all flags of high rank are complex. This is to be expected by basic linear algebra: any \(n \times n\) diagonal matrix with nonzero diagonal is full rank yet can be fully specified by an \(n\)-tuple of elements. In the same vein, the flag of the Republic of Congo is full rank due to its diagonal structure, yet it is composed of just three lines.

Another measure of complexity of a flag is given by the size in bytes of the flag’s svg file. The Republic of Congo’s svg is only 416 bytes, placing it as only the 183rd largest file in our set. After doing my analysis, I found this blog post which also ranks country flags by their svg file’s size in bytes, and funnily enough does so by using the size of Palau’s svg file as the smallest unit of measurement.

To compute ranks of the grayscale images, I simply use the rank function with default parameters in Julia’s default LinearAlgebra library. This function works by defining the rank as the number of singular values of \(A\) that have magnitude greater than \(k \cdot \epsilon \cdot \sigma_1\), where \(k\) is the smaller of number of rows and number of columns of \(A\), \(\epsilon\) is machine epsilon, and \(\sigma_1\) is the largest singular value of \(A\). Due to numerical error, the results are not perfect (e.g. we find that it computes the rank of the \(2 \times 200\) matrix with all elements equal to \(\pi\) as rank two), but we find that they are accurate in all low-rank examples besides Switzerland, for which the computed rank is 5 but the actual rank is 3.

Since flags have different aspect ratios and require different resolutions, the matrices are of different sizes, which makes a direct comparison of rank to be unfair. Thus, we instead consider the ratio between the rank and the maximum possible rank given the size of the matrix, \(\frac{\mathrm{rank}(A)}{\min(m,n)}\), which we call the percent rank of the \(m \times n\) matrix \(A\).

Altogether, \(15\) flags have full rank and \(43\) flags have rank one. Every one of the flags with full rank has some diagonal structure extending along an entire diagonal. While some of these full rank flags are simple like the Republic of Congo’s, some of them are—in my eyes—deserving of full rank, like Bhutan’s. The distribution of percent ranks covers much of \([0,1]\), with some bias towards rank one and full rank values.

Looking at the flags of lowest rank, we see some interesting patterns in choices by flag designers. Each rank one flag either consists of only horizontal or only vertical lines. None of these rank one flags have more than four different colors, and none of them have more than five lines. A lot of territories fly the flag of France. There are two different types of rank two flags: a bicolored flag with a cross and a multicolored flag with one vertical line with a few horizontal lines. The rank three flags either have an outlined cross or Switzerland’s cross that does not extend to the edges of the flag.

Rank one flags (all scaled to 2:3 aspect ratio)

We find that rank and percent rank have a nice, somewhat linear positive relationship. Also, filesize and percent rank have a somewhat positive relationship, but there is clearly a lot of deviation in which flags with low filesize have high percent rank. However, the flags with the largest filesizes do tend to have large ranks; the complexity of an svg file describing a low rank flag is just never that high. It is easy to imagine a pathological case in which a flag is solely composed of a massive amount of thin horizontal lines, in which case we have a rank one flag with a large filesize. For some reason, no state has used such a flag.

An example pathological flag: a rank one flag with a 656KB svg file.

Rank vs. percent rank (Pearson correlation .649)

Filesize vs. percent rank (Pearson correlation .317)

Looking at the flags with the largest filesize, we see an assortment of flags with complex symbols and drawings.

Some of these large filesize flags still do not have very high rank. The flag of Pennsylvania has filesize 775,498 bytes and 62% rank, so it has the second largest filesize but merely 107th largest percent rank. While it has a detailed seal, the seal does not take up the whole flag, so it has many linearly dependent rows that are just entirely blue.

The case of Pennsylvania gives some nice intuition on these two flag complexity metrics, and suggests another way to measure complexity. We can measure each flag with a combination of these two metrics by taking the sum of the position (a better term would be rank or positional rank, but that would be confusing in the context of the linear rank) as measured by filesize with the position as measured by percent rank. Another way to combine the ranks would be through something like a rank product, but the simple sum maintains integer values. Here are the most complex flags as measured by this metric.

These flags do not just have detailed symbols and drawings, but in fact they have large drawings that span significant portions of the flag, or at least diagonal lines that do. Pennsylvania falls to position 27 in this ranking, but other flags like North Dakota’s (position 4) have a seal in the middle with a lot of empty columns; the difference is that North Dakota’s seal takes up a larger portion of the vertical space of the flag, so there are many linearly independent rows. There are still some flags that somewhat hack this ranking. The flag of Florida (position 1) has a small seal, but it has diagonal lines that give it full rank.

Without US States and Canadian Provinces

For anyone who does not care for US State or Canadian provinces, we also compute the filesize and position sum rankings after omitting flags for these subregions. Note that no US State or Canadian province flags have low-rank, while many have large filesizes. Some regions change relative order in position sum after omitting the subregions, in part due to omission of subregions making the filesize rankings more compact. For instance, Saint Pierre and Miquelon is now position 1, even though it was behind Tristan da Cunha when subregions were included.

Top 12 non-subregion flags by position sum

Thoughts

The ranking of flag complexity by the sum of the rankings by percent rank and filesize seems pretty fair to me. I especially feel that the flag of Saint Pierre and Miquelon is deserving of the top spot, as it has neat designs that take up the whole flag. I am somewhat disappointed by the lack of variety in the designs of the flags of rank at most 3, but this is understandable given that flag designers probably did not care much for the rank of their flags. This type of computation would probably be pretty interesting for other types of flags and graphics. For example, the checkered flags used in auto racing have rank two and have a design distinct from the two types of rank two region flags.