Splunking the Aspect Ratio Distribution of National Flags
When I tried to align the Union Jack and the flag of Germany on a presentation slide I noticed that I couldn’t – their aspect ratios are different. A quick search led me to this list of aspect ratios of national flags on Wikipedia. Apparently, national flags are far from standardized. A broad range of aspect ratios is being used. I decided to start a little fun project finding out which aspect ratios are most common.
Wikipedia Table to CSV
Wikipedia’s list comes as an HTML table. I would need the data as CSV for any kind of further processing. As it turns out, there is a website for exactly that: Convert Wiki Tables to CSV. You feed it the URL of the Wikipedia page you are interested in and it converts any tables on said page to CSV.
Cleaning up the CSV
Ultimately I wanted to process the CSV data in Splunk because that is the analytics tool I know best. Before I imported the data into Splunk, however, I wanted to clean it up a little. For that, Excel seemed easiest. When you paste CSV into Excel it sadly is not smart enough to offer splitting it into multiple columns. But that is achieved easily enough with the Text to Columns function.
After I had split the comma-separated table fields into columns again, I noticed that the Ratio column had both the fraction and the decimal value, separated by a space, e.g., “2:3 (1.5)”. So I repeated the splitting procedure to move the decimal value into a new column I would then discard along with some other unnecessary columns. I was left with the two columns Country and Ratio.
Once I had saved the resulting data to a new CSV file I turned to Splunk. The easiest way to import a CSV file is the Add Data wizard which guides you through the process. I had the wizard create a new index national_flags for the data I was importing so as to have a dedicated container that would later be easy to identify and ultimately delete.
Once imported, I whipped up a quick SPL search to calculate the percentage of the 10 most common aspect ratios. When you are not totally new to Splunk these kinds of things are almost laughably easy, yet you always have full control over every aspect of the process. Here is what I used:
index=national_flags | top 10 Ratio | stats sum(percent) as Percent by Ratio | sort -Percent
Explanation of the Search
- Selects all data from the specified index
- top 10 Ratio
- Selects the 10 most common values of the field Ratio and adds a new field percent to every row
- stats sum(percent) as Percent by Ratio
- Groups percent (renamed to Percent) by Ratio. This is not strictly necessary but makes the visualization easier.
- sort -Percent
- Sort by the field Percent (descending)
The only thing left to do was to click Visualization for Splunk to generate the following chart:
Popular Aspect Ratios
Out of the 257 national flags in Wikipedia’s list by far the two most popular aspect ratios are 2:3 (~42%) and 1:2 (~33%). The third most popular aspect ratio, 3:5, is already as low as ~8%.
For those interested, below is a table of the aspect ratio distribution. I created it by running the following Splunk search, exporting the results to CSV and converting it to HTML here.
index=national_flags | top 200 Ratio
Aspect Ratio Distribution Table