Tachy

Canine S-index


Or, which of these doggos is most similar to a sausage?

View on GitHub

Sausage dogs are among the cutest creatures in existence, no doubt. Just look at this majestic unit:

Dachshund

Credit: Igor Bredikhin via Wikipedia

Corgis and Basset Hounds are two other sausage dogs in contention for ultimate sausage-ness, but which of these sausage dogs are objectively closer to the sausage ideal? Where do other dog breeds fall on the sausage-ness spectrum? These are important questions that need answering.

The most important metric to consider when measuring sausage-ness is, of course, the side profile of the dog: is it longer than it is tall? We can use the ratio of (back) length to height to compare dogs on this metric. I'm going to call this ratio the S-index in my code, but you can call it sausage-ness if you like.

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

Data source

It's surprisingly difficult to find morphometric data for dog breeds. The American Kennel Club provides height and weight information but not back length, which is crucial in determining S-index. The authors of this paper collected a large variety of measurements, including back length, but they haven't supplied any raw data. If this were a proper scientific study, I'd have to contact the authors with a request for data, but I've already invested more time than I should in this study of sausage dogs, so here we go.

The design-focused website dimensions.com does provide back lengths. Instead of the withers height which is used by the American Kennel Club, height in the rest of this notebook refers to standing height. This is simply because I don't currently have the time to collect the withers height. Using withers height would probably give us a better indicator of sausage-ness.

Note that this website doesn't cite the source of their data, so take this otherwise scientifically rigorous study of sausage dogs with a large heap of salt. Only 51 dog breeds are available here, which is an okay but rather modest sample size. Furthermore, only minimum and maximum values are provided, so I had to settle with using mid-range as a measure of central tendency.

I collected the necessary data from dimensions.com and stored them in a csv file. All measurements are in metric units. No cleaning necessary, which is always nice!

df = pd.read_csv('data.csv')
df
breed length_min length_max height_min height_max weight_min weight_max
0 Pomeranian 24 28 20 24 1.4 3.2
1 Chihuahua 24 38 22 33 1.4 2.7
2 Yorkshire Terrier 30 39 27 33 1.8 5.4
3 Papillon 30 43 30 42 2.3 4.5
4 Maltese 34 44 30 38 1.4 3.6
5 Shih Tzu 38 44 33 38 4.1 7.3
6 Cavalier King Charles Spaniel 48 51 38 42 5.9 8.2
7 Pug 39 52 33 43 6.4 8.2
8 Jack Russell Terrier 46 55 36 43 4.1 6.8
9 French Bulldog 46 55 39 47 7.3 12.7
10 Bichon Frise 46 56 33 41 5.4 8.2
11 Beagle 51 64 46 56 9.0 11.0
12 Dachshund 55 64 33 37 7.3 14.5
13 Basenji 58 64 48 51 9.1 11.8
14 Havanese 48 65 32 43 3.2 5.9
15 Pembroke Welsh Corgi 56 66 36 43 10.0 14.0
16 Bulldog 51 69 38 48 18.0 25.0
17 Shiba Inu 58 71 44 55 6.8 10.9
18 Cocker Spaniel 61 74 46 56 12.0 16.0
19 Whippet 58 79 61 71 7.0 14.0
20 Standard Schnauzer 71 79 58 66 14.0 20.0
21 Standard Poodle 62 81 61 83 20.0 32.0
22 English Springer Spaniel 69 84 65 79 18.0 25.0
23 Australian Cattle Dog 71 84 53 64 14.0 16.0
24 Dalmatian 79 84 70 76 20.0 32.0
25 Border Collie 71 86 56 69 12.0 20.0
26 Collie 74 86 66 81 18.0 29.0
27 Siberian Husky 76 88 67 79 15.9 29.5
28 Basset Hound 66 89 41 53 20.0 29.0
29 Boxer 76 89 72 84 22.7 36.3
30 Samoyed 72 90 62 76 15.9 29.5
31 Australian Shepherd 71 91 64 81 16.0 32.0
32 Bull Terrier 80 98 60 74 20.4 36.3
33 Belgian Malinois 86 102 72 84 18.1 36.3
34 German Pointer 86 104 53 64 20.0 32.0
35 Alaskan Malamute 89 105 75 88 34.0 39.0
36 Bernese Mountain Dog 89 105 76 91 32.0 52.0
37 Newfoundland 93 107 83 95 45.4 68.0
38 Golden Retriever 94 107 72 80 25.0 34.0
39 Rottweiler 98 107 77 86 42.0 55.0
40 German Shepherd 91 108 67 79 23.0 41.0
41 Great Dane 90 109 108 126 50.0 79.0
42 Akita 93 110 79 93 31.8 59.0
43 Tibetan Mastiff 90 113 72 90 31.8 68.0
44 Bullmastiff 102 113 76 86 45.4 59.0
45 Doberman Pinscher 102 117 76 90 27.2 45.4
46 Saint Bernard 102 119 81 94 54.0 82.0
47 Irish Wolfhound 117 128 104 114 52.2 81.6
48 Great Pyrenees 103 133 81 103 36.3 54.4
49 Mastiff 113 135 89 107 54.4 104.3
50 Cane Corso 105 140 83 109 38.6 49.9
def calc_mid(a, b):
'''
Get mid-range from min and max.
'''

return (a+b) / 2
df['length_mid'] = df.apply(lambda x: calc_mid(x['length_min'], x['length_max']), axis=1)
df['height_mid'] = df.apply(lambda x: calc_mid(x['height_min'], x['height_max']), axis=1)
df['s_index'] = df['length_mid'] / df['height_mid']
df.head()
breed length_min length_max height_min height_max weight_min weight_max length_mid height_mid s_index
0 Pomeranian 24 28 20 24 1.4 3.2 26.0 22.0 1.181818
1 Chihuahua 24 38 22 33 1.4 2.7 31.0 27.5 1.127273
2 Yorkshire Terrier 30 39 27 33 1.8 5.4 34.5 30.0 1.150000
3 Papillon 30 43 30 42 2.3 4.5 36.5 36.0 1.013889
4 Maltese 34 44 30 38 1.4 3.6 39.0 34.0 1.147059
sns.set_theme()
sns.set(font='monospace')

Plots

Scatter plot

I initially planned to display the data in a scatter plot, but labelling each point would create too much visual clutter. Nonetheless, it shows that most dog breeds in our sample are longer than they are tall.

g = sns.lmplot(data=df, x='height_mid', y='length_mid')

Scatter plot

Bar plot

We lose information about absolute height and length by using the S-index ratio, but it makes it easier to compare sausage-ness directly.

fig, ax = plt.subplots(figsize=(20, 18))

sns.barplot(ax=ax, data=df.sort_values('s_index', ascending=False), x='s_index', y='breed', color='#42b7bd')

# Vertical line at x=1
#ax.axvline(1, ls='--')

ax.set_title('Sausage-ness of dog breeds', fontsize=24, pad=30)
ax.set_ylabel('Breed', labelpad=20, fontsize=18)
ax.set_xlabel('Length / Height', labelpad=20, fontsize=18)


# Annotation with box
bbox_props = dict(boxstyle="rarrow,pad=0.6", fc="cyan", ec="#42b7bd", lw=2)
t = ax.text(1.35, 49, 'Approaching theoretical sausage ideal', bbox=bbox_props)

# Annotation with arrows
ax.annotate('Tall bois', xy=(0.993056, 49), xycoords='data', xytext=(1.1, 49), textcoords='data',
arrowprops=dict(arrowstyle='->', color='red'))
ax.annotate('', xy=(0.850427, 50), xycoords='data', xytext=(1.15, 49), textcoords='data',
arrowprops=dict(arrowstyle='->', color='red', connectionstyle='angle'))

plt.tight_layout()
plt.savefig('sausageness.png', dpi=300)

Bar plot

Perhaps unsurprisingly, the Dachshund is the most sausage-y of sausage dogs, followed closely by the Basset Hound. I'd expected the Corgi to have a higher S-index than it does here. The Havanese isn't a breed I would normally classify as a sausage dog at first glance, but it goes to show how deceiving all that fur can be.

It's also interesting to note that we have two tall bois in our sample. The Great Dane and the Standard Poodle are both taller than they are long.

Great Dane

Credit: Lilly M via Wikipedia

Standard Poodle

Credit: Tim Wilson via Wikipedia

Now you know which sausage dogs are more sausage-y, where other dogs lie on the sausage-ness (cough S-index) spectrum, and which dogs are tall bois. You're welcome.

Things I Learned

This project is more than a little tongue-in-cheek, but I managed to get a surprising amount out of it.

Mid-range

My first instinct was to use mean or median height and length as a measure of central tendency, but I stopped short as sooon as I realised that I only had access to the minimum and maximum values. I Googled around and found this thread about the mid-range as an estimator of the mean. It's a pretty dismal statistic, but you work with what you have.

Seaborn

I haven't used Matplotlib and Seaborn in a while. Though I love Seaborn for abstracting away a lot of finicky styling decisions, I now realise I need to go through the Seaborn docs properly to understand how it interfaces with Matplotlib. Specifically, I should read up on figure-level and axes level Seaborn functions (in this notebook, the scatter plot and bar plot are figure-level and axes-level functions respectively).

Day 30 of #100DaysToOffload

← Home