Canine S-index
Or, which of these doggos is most similar to a sausage?
Sausage dogs are among the cutest creatures in existence, no doubt. Just look at this majestic unit:
Credit: Igor Bredikhin via Wikipedia
Corgis and Basset Hounds are two other sausage dogs in contention for ultimate sausage-ness, but which of these sausage dogs are objectively closer to the sausage ideal? Where do other dog breeds fall on the sausage-ness spectrum? These are important questions that need answering.
The most important metric to consider when measuring sausage-ness is, of course, the side profile of the dog: is it longer than it is tall? We can use the ratio of (back) length to height to compare dogs on this metric. I'm going to call this ratio the S-index
in my code, but you can call it sausage-ness if you like.
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
Data source
It's surprisingly difficult to find morphometric data for dog breeds. The American Kennel Club provides height and weight information but not back length, which is crucial in determining S-index. The authors of this paper collected a large variety of measurements, including back length, but they haven't supplied any raw data. If this were a proper scientific study, I'd have to contact the authors with a request for data, but I've already invested more time than I should in this study of sausage dogs, so here we go.
The design-focused website dimensions.com does provide back lengths. Instead of the withers height which is used by the American Kennel Club, height in the rest of this notebook refers to standing height. This is simply because I don't currently have the time to collect the withers height. Using withers height would probably give us a better indicator of sausage-ness.
Note that this website doesn't cite the source of their data, so take this otherwise scientifically rigorous study of sausage dogs with a large heap of salt. Only 51 dog breeds are available here, which is an okay but rather modest sample size. Furthermore, only minimum and maximum values are provided, so I had to settle with using mid-range as a measure of central tendency.
I collected the necessary data from dimensions.com and stored them in a csv file. All measurements are in metric units. No cleaning necessary, which is always nice!
df = pd.read_csv('data.csv')
df
breed | length_min | length_max | height_min | height_max | weight_min | weight_max | |
---|---|---|---|---|---|---|---|
0 | Pomeranian | 24 | 28 | 20 | 24 | 1.4 | 3.2 |
1 | Chihuahua | 24 | 38 | 22 | 33 | 1.4 | 2.7 |
2 | Yorkshire Terrier | 30 | 39 | 27 | 33 | 1.8 | 5.4 |
3 | Papillon | 30 | 43 | 30 | 42 | 2.3 | 4.5 |
4 | Maltese | 34 | 44 | 30 | 38 | 1.4 | 3.6 |
5 | Shih Tzu | 38 | 44 | 33 | 38 | 4.1 | 7.3 |
6 | Cavalier King Charles Spaniel | 48 | 51 | 38 | 42 | 5.9 | 8.2 |
7 | Pug | 39 | 52 | 33 | 43 | 6.4 | 8.2 |
8 | Jack Russell Terrier | 46 | 55 | 36 | 43 | 4.1 | 6.8 |
9 | French Bulldog | 46 | 55 | 39 | 47 | 7.3 | 12.7 |
10 | Bichon Frise | 46 | 56 | 33 | 41 | 5.4 | 8.2 |
11 | Beagle | 51 | 64 | 46 | 56 | 9.0 | 11.0 |
12 | Dachshund | 55 | 64 | 33 | 37 | 7.3 | 14.5 |
13 | Basenji | 58 | 64 | 48 | 51 | 9.1 | 11.8 |
14 | Havanese | 48 | 65 | 32 | 43 | 3.2 | 5.9 |
15 | Pembroke Welsh Corgi | 56 | 66 | 36 | 43 | 10.0 | 14.0 |
16 | Bulldog | 51 | 69 | 38 | 48 | 18.0 | 25.0 |
17 | Shiba Inu | 58 | 71 | 44 | 55 | 6.8 | 10.9 |
18 | Cocker Spaniel | 61 | 74 | 46 | 56 | 12.0 | 16.0 |
19 | Whippet | 58 | 79 | 61 | 71 | 7.0 | 14.0 |
20 | Standard Schnauzer | 71 | 79 | 58 | 66 | 14.0 | 20.0 |
21 | Standard Poodle | 62 | 81 | 61 | 83 | 20.0 | 32.0 |
22 | English Springer Spaniel | 69 | 84 | 65 | 79 | 18.0 | 25.0 |
23 | Australian Cattle Dog | 71 | 84 | 53 | 64 | 14.0 | 16.0 |
24 | Dalmatian | 79 | 84 | 70 | 76 | 20.0 | 32.0 |
25 | Border Collie | 71 | 86 | 56 | 69 | 12.0 | 20.0 |
26 | Collie | 74 | 86 | 66 | 81 | 18.0 | 29.0 |
27 | Siberian Husky | 76 | 88 | 67 | 79 | 15.9 | 29.5 |
28 | Basset Hound | 66 | 89 | 41 | 53 | 20.0 | 29.0 |
29 | Boxer | 76 | 89 | 72 | 84 | 22.7 | 36.3 |
30 | Samoyed | 72 | 90 | 62 | 76 | 15.9 | 29.5 |
31 | Australian Shepherd | 71 | 91 | 64 | 81 | 16.0 | 32.0 |
32 | Bull Terrier | 80 | 98 | 60 | 74 | 20.4 | 36.3 |
33 | Belgian Malinois | 86 | 102 | 72 | 84 | 18.1 | 36.3 |
34 | German Pointer | 86 | 104 | 53 | 64 | 20.0 | 32.0 |
35 | Alaskan Malamute | 89 | 105 | 75 | 88 | 34.0 | 39.0 |
36 | Bernese Mountain Dog | 89 | 105 | 76 | 91 | 32.0 | 52.0 |
37 | Newfoundland | 93 | 107 | 83 | 95 | 45.4 | 68.0 |
38 | Golden Retriever | 94 | 107 | 72 | 80 | 25.0 | 34.0 |
39 | Rottweiler | 98 | 107 | 77 | 86 | 42.0 | 55.0 |
40 | German Shepherd | 91 | 108 | 67 | 79 | 23.0 | 41.0 |
41 | Great Dane | 90 | 109 | 108 | 126 | 50.0 | 79.0 |
42 | Akita | 93 | 110 | 79 | 93 | 31.8 | 59.0 |
43 | Tibetan Mastiff | 90 | 113 | 72 | 90 | 31.8 | 68.0 |
44 | Bullmastiff | 102 | 113 | 76 | 86 | 45.4 | 59.0 |
45 | Doberman Pinscher | 102 | 117 | 76 | 90 | 27.2 | 45.4 |
46 | Saint Bernard | 102 | 119 | 81 | 94 | 54.0 | 82.0 |
47 | Irish Wolfhound | 117 | 128 | 104 | 114 | 52.2 | 81.6 |
48 | Great Pyrenees | 103 | 133 | 81 | 103 | 36.3 | 54.4 |
49 | Mastiff | 113 | 135 | 89 | 107 | 54.4 | 104.3 |
50 | Cane Corso | 105 | 140 | 83 | 109 | 38.6 | 49.9 |
def calc_mid(a, b):
'''
Get mid-range from min and max.
'''
return (a+b) / 2
df['length_mid'] = df.apply(lambda x: calc_mid(x['length_min'], x['length_max']), axis=1)
df['height_mid'] = df.apply(lambda x: calc_mid(x['height_min'], x['height_max']), axis=1)
df['s_index'] = df['length_mid'] / df['height_mid']
df.head()
breed | length_min | length_max | height_min | height_max | weight_min | weight_max | length_mid | height_mid | s_index | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Pomeranian | 24 | 28 | 20 | 24 | 1.4 | 3.2 | 26.0 | 22.0 | 1.181818 |
1 | Chihuahua | 24 | 38 | 22 | 33 | 1.4 | 2.7 | 31.0 | 27.5 | 1.127273 |
2 | Yorkshire Terrier | 30 | 39 | 27 | 33 | 1.8 | 5.4 | 34.5 | 30.0 | 1.150000 |
3 | Papillon | 30 | 43 | 30 | 42 | 2.3 | 4.5 | 36.5 | 36.0 | 1.013889 |
4 | Maltese | 34 | 44 | 30 | 38 | 1.4 | 3.6 | 39.0 | 34.0 | 1.147059 |
sns.set_theme()
sns.set(font='monospace')
Plots
Scatter plot
I initially planned to display the data in a scatter plot, but labelling each point would create too much visual clutter. Nonetheless, it shows that most dog breeds in our sample are longer than they are tall.
g = sns.lmplot(data=df, x='height_mid', y='length_mid')
Bar plot
We lose information about absolute height and length by using the S-index
ratio, but it makes it easier to compare sausage-ness directly.
fig, ax = plt.subplots(figsize=(20, 18))
sns.barplot(ax=ax, data=df.sort_values('s_index', ascending=False), x='s_index', y='breed', color='#42b7bd')
# Vertical line at x=1
#ax.axvline(1, ls='--')
ax.set_title('Sausage-ness of dog breeds', fontsize=24, pad=30)
ax.set_ylabel('Breed', labelpad=20, fontsize=18)
ax.set_xlabel('Length / Height', labelpad=20, fontsize=18)
# Annotation with box
bbox_props = dict(boxstyle="rarrow,pad=0.6", fc="cyan", ec="#42b7bd", lw=2)
t = ax.text(1.35, 49, 'Approaching theoretical sausage ideal', bbox=bbox_props)
# Annotation with arrows
ax.annotate('Tall bois', xy=(0.993056, 49), xycoords='data', xytext=(1.1, 49), textcoords='data',
arrowprops=dict(arrowstyle='->', color='red'))
ax.annotate('', xy=(0.850427, 50), xycoords='data', xytext=(1.15, 49), textcoords='data',
arrowprops=dict(arrowstyle='->', color='red', connectionstyle='angle'))
plt.tight_layout()
plt.savefig('sausageness.png', dpi=300)
Perhaps unsurprisingly, the Dachshund is the most sausage-y of sausage dogs, followed closely by the Basset Hound. I'd expected the Corgi to have a higher S-index
than it does here. The Havanese isn't a breed I would normally classify as a sausage dog at first glance, but it goes to show how deceiving all that fur can be.
It's also interesting to note that we have two tall bois in our sample. The Great Dane and the Standard Poodle are both taller than they are long.
Credit: Lilly M via Wikipedia
Credit: Tim Wilson via Wikipedia
Now you know which sausage dogs are more sausage-y, where other dogs lie on the sausage-ness (cough S-index
) spectrum, and which dogs are tall bois. You're welcome.
Things I Learned
This project is more than a little tongue-in-cheek, but I managed to get a surprising amount out of it.
Mid-range
My first instinct was to use mean or median height and length as a measure of central tendency, but I stopped short as sooon as I realised that I only had access to the minimum and maximum values. I Googled around and found this thread about the mid-range as an estimator of the mean. It's a pretty dismal statistic, but you work with what you have.
Seaborn
I haven't used Matplotlib and Seaborn in a while. Though I love Seaborn for abstracting away a lot of finicky styling decisions, I now realise I need to go through the Seaborn docs properly to understand how it interfaces with Matplotlib. Specifically, I should read up on figure-level and axes level Seaborn functions (in this notebook, the scatter plot and bar plot are figure-level and axes-level functions respectively).
Day 30 of #100DaysToOffload