Data Analysis: Game of Thrones (Fun)

6 minute read

Game of Thrones

I happened to bump into a Game of Thrones data set and decided it would interesting to see what was going on there. I recently watched the seasons 1 - 7, under duress at first. But slowly realised the genius and brilliance of George RR Martin.

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sn # data visualization
import matplotlib.pyplot as ml # data visualisation as well
import warnings


sn.set(color_codes = True, style="white")
battles = pd.read_csv("battles.csv", sep=",", header=0)
deaths = pd.read_csv("character-deaths.csv", sep=",", header=0)
predictions = pd.read_csv("character-predictions.csv", sep=",", header=0)

We look at the basic shape of the data sets

battles.shape

(38, 25)

deaths.shape

(917, 13)

predictions.shape

(1946, 33)

Lets have a peek at what we are working with.

battles.head()

	name	year	battle_number	attacker_king	defender_king	attacker_1	attacker_2	attacker_3	attacker_4	defender_1	...	major_death	major_capture	attacker_size	defender_size	attacker_commander	defender_commander	summer	location	region	note
0	Battle of the Golden Tooth	298	1	Joffrey/Tommen Baratheon	Robb Stark	Lannister	NaN	NaN	NaN	Tully	...	1.0	0.0	15000.0	4000.0	Jaime Lannister	Clement Piper, Vance	1.0	Golden Tooth	The Westerlands	NaN
1	Battle at the Mummer's Ford	298	2	Joffrey/Tommen Baratheon	Robb Stark	Lannister	NaN	NaN	NaN	Baratheon	...	1.0	0.0	NaN	120.0	Gregor Clegane	Beric Dondarrion	1.0	Mummer's Ford	The Riverlands	NaN
2	Battle of Riverrun	298	3	Joffrey/Tommen Baratheon	Robb Stark	Lannister	NaN	NaN	NaN	Tully	...	0.0	1.0	15000.0	10000.0	Jaime Lannister, Andros Brax	Edmure Tully, Tytos Blackwood	1.0	Riverrun	The Riverlands	NaN
3	Battle of the Green Fork	298	4	Robb Stark	Joffrey/Tommen Baratheon	Stark	NaN	NaN	NaN	Lannister	...	1.0	1.0	18000.0	20000.0	Roose Bolton, Wylis Manderly, Medger Cerwyn, H...	Tywin Lannister, Gregor Clegane, Kevan Lannist...	1.0	Green Fork	The Riverlands	NaN
4	Battle of the Whispering Wood	298	5	Robb Stark	Joffrey/Tommen Baratheon	Stark	Tully	NaN	NaN	Lannister	...	1.0	1.0	1875.0	6000.0	Robb Stark, Brynden Tully	Jaime Lannister	1.0	Whispering Wood	The Riverlands	NaN

5 rows × 23 columns

deaths.head()

Index(['Name', 'Allegiances', 'Death Year', 'Book of Death', 'Death Chapter',
       'Book Intro Chapter', 'Gender', 'Nobility', 'GoT', 'CoK', 'SoS', 'FfC',
       'DwD'],
      dtype='object')

predictions.head()

Index(['S.No', 'actual', 'pred', 'alive', 'plod', 'name', 'title', 'male',
       'culture', 'dateOfBirth', 'DateoFdeath', 'mother', 'father', 'heir',
       'house', 'spouse', 'book1', 'book2', 'book3', 'book4', 'book5',
       'isAliveMother', 'isAliveFather', 'isAliveHeir', 'isAliveSpouse',
       'isMarried', 'isNoble', 'age', 'numDeadRelations', 'boolDeadRelations',
       'isPopular', 'popularity', 'isAlive'],
      dtype='object')

Basic Statistical Analysis of the data sets.

battles.describe()

	year	battle_number	major_death	major_capture	attacker_size	defender_size	summer
count	38.000000	38.000000	37.000000	37.000000	24.000000	19.000000	37.000000
mean	299.105263	19.500000	0.351351	0.297297	9942.541667	6428.157895	0.702703
std	0.689280	11.113055	0.483978	0.463373	20283.092065	6225.182106	0.463373
min	298.000000	1.000000	0.000000	0.000000	20.000000	100.000000	0.000000
25%	299.000000	10.250000	0.000000	0.000000	1375.000000	1070.000000	0.000000
50%	299.000000	19.500000	0.000000	0.000000	4000.000000	6000.000000	1.000000
75%	300.000000	28.750000	1.000000	1.000000	8250.000000	10000.000000	1.000000
max	300.000000	38.000000	1.000000	1.000000	100000.000000	20000.000000	1.000000

Columns (defender_3, defender_4) are basically empty and irrelevant so we can drop them.

battles = battles.drop(['defender_3', 'defender_4'],1)

deaths.describe()

	Death Year	Book of Death	Death Chapter	Book Intro Chapter	Gender	Nobility	GoT	CoK	SoS	FfC	DwD
count	305.000000	307.000000	299.000000	905.000000	917.000000	917.000000	917.000000	917.000000	917.000000	917.000000	917.000000
mean	299.157377	2.928339	40.070234	28.861878	0.828790	0.468920	0.272628	0.353326	0.424209	0.272628	0.284624
std	0.703483	1.326482	20.470270	20.165788	0.376898	0.499305	0.445554	0.478264	0.494492	0.445554	0.451481
min	297.000000	1.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
25%	299.000000	2.000000	25.500000	11.000000	1.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
50%	299.000000	3.000000	39.000000	27.000000	1.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
75%	300.000000	4.000000	57.000000	43.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
max	300.000000	5.000000	80.000000	80.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000

predictions.describe()

	S.No	actual	pred	alive	plod	male	dateOfBirth	DateoFdeath	book1	book2	...	isAliveHeir	isAliveSpouse	isMarried	isNoble	age	numDeadRelations	boolDeadRelations	isPopular	popularity	isAlive
count	1946.000000	1946.000000	1946.000000	1946.000000	1946.000000	1946.000000	433.000000	444.000000	1946.000000	1946.000000	...	23.000000	276.000000	1946.000000	1946.000000	433.000000	1946.000000	1946.000000	1946.000000	1946.000000	1946.000000
mean	973.500000	0.745632	0.687050	0.634470	0.365530	0.619219	1577.364896	2950.193694	0.198356	0.374615	...	0.652174	0.778986	0.141829	0.460946	-1293.563510	0.305755	0.074512	0.059096	0.089584	0.745632
std	561.906131	0.435617	0.463813	0.312637	0.312637	0.485704	19565.414460	28192.245529	0.398864	0.484148	...	0.486985	0.415684	0.348965	0.498601	19564.340993	1.383910	0.262669	0.235864	0.160568	0.435617
min	1.000000	0.000000	0.000000	0.000000	0.000000	0.000000	-28.000000	0.000000	0.000000	0.000000	...	0.000000	0.000000	0.000000	0.000000	-298001.000000	0.000000	0.000000	0.000000	0.000000	0.000000
25%	487.250000	0.000000	0.000000	0.391250	0.101000	0.000000	240.000000	282.000000	0.000000	0.000000	...	0.000000	1.000000	0.000000	0.000000	18.000000	0.000000	0.000000	0.000000	0.013378	0.000000
50%	973.500000	1.000000	1.000000	0.735500	0.264500	1.000000	268.000000	299.000000	0.000000	0.000000	...	1.000000	1.000000	0.000000	0.000000	27.000000	0.000000	0.000000	0.000000	0.033445	1.000000
75%	1459.750000	1.000000	1.000000	0.899000	0.608750	1.000000	285.000000	299.000000	0.000000	1.000000	...	1.000000	1.000000	0.000000	1.000000	50.000000	0.000000	0.000000	0.000000	0.086957	1.000000
max	1946.000000	1.000000	1.000000	1.000000	1.000000	1.000000	298299.000000	298299.000000	1.000000	1.000000	...	1.000000	1.000000	1.000000	1.000000	100.000000	15.000000	1.000000	1.000000	1.000000	1.000000

8 rows × 25 columns

We will now see if we have any relationships between variables

battles.corr()

	year	battle_number	major_death	major_capture	attacker_size	defender_size	summer
year	1.000000	0.906781	-0.341050	-0.166234	0.155841	-0.366048	-0.841912
battle_number	0.906781	1.000000	-0.270421	-0.105225	0.086418	-0.297730	-0.799090
major_death	-0.341050	-0.270421	1.000000	0.264464	0.267966	0.081815	0.337136
major_capture	-0.166234	-0.105225	0.264464	1.000000	0.331961	0.249510	0.142112
attacker_size	0.155841	0.086418	0.267966	0.331961	1.000000	-0.112118	-0.273054
defender_size	-0.366048	-0.297730	0.081815	0.249510	-0.112118	1.000000	0.347108
summer	-0.841912	-0.799090	0.337136	0.142112	-0.273054	0.347108	1.000000

ml.figure(figsize=(20,10)) 
sn.heatmap(battles.corr(),annot=True)

<matplotlib.axes._subplots.AxesSubplot at 0xba67550>

alt

There seems to be no correlation between any variables

deaths.corr()

	Death Year	Book of Death	Death Chapter	Book Intro Chapter	Gender	Nobility	GoT	CoK	SoS	FfC	DwD
Death Year	1.000000	0.832274	-0.068763	0.051674	-0.135058	0.042457	-0.439234	0.077509	0.313062	0.357204	0.550213
Book of Death	0.832274	1.000000	-0.207666	0.016975	-0.111461	0.017475	-0.434004	-0.131860	0.350095	0.340436	0.714574
Death Chapter	-0.068763	-0.207666	1.000000	0.388283	-0.086533	0.075943	0.126657	0.012939	-0.149095	-0.167285	-0.145384
Book Intro Chapter	0.051674	0.016975	0.388283	1.000000	0.058684	-0.068825	0.129241	0.002445	0.158419	-0.146165	-0.077509
Gender	-0.135058	-0.111461	-0.086533	0.058684	1.000000	-0.060213	0.070228	0.063424	-0.049199	-0.040289	-0.046924
Nobility	0.042457	0.017475	0.075943	-0.068825	-0.060213	1.000000	0.087201	0.055179	0.046825	0.146088	-0.001880
GoT	-0.439234	-0.434004	0.126657	0.129241	0.070228	0.087201	1.000000	0.121257	0.004696	-0.088852	-0.120242
CoK	0.077509	-0.131860	0.012939	0.002445	0.063424	0.055179	0.121257	1.000000	-0.002049	-0.083669	-0.107276
SoS	0.313062	0.350095	-0.149095	0.158419	-0.049199	0.046825	0.004696	-0.002049	1.000000	-0.074585	-0.013294
FfC	0.357204	0.340436	-0.167285	-0.146165	-0.040289	0.146088	-0.088852	-0.083669	-0.074585	1.000000	-0.109387
DwD	0.550213	0.714574	-0.145384	-0.077509	-0.046924	-0.001880	-0.120242	-0.107276	-0.013294	-0.109387	1.000000

ml.figure(figsize=(20,10)) 
sn.heatmap(deaths.corr(),annot=True)

<matplotlib.axes._subplots.AxesSubplot at 0xc42de10>

alt

There seems to be a decent correlation between the Death Year and Book (DWD) Dancing with Dragons and Book of Death.

Battles

When we look at the battles lets see who is winning.

ml.figure(figsize = (15,10))
sn.countplot(x='attacker_outcome',data = battles)

<matplotlib.axes._subplots.AxesSubplot at 0xbb6cd30>

alt

We can say chances are if you start a battle you will likely win. Which makes sense, because one would never start a battle if they did not feel more than prepared. Also you would never attack your opponent when they expect it.

ml.figure(figsize = (15,10))
attack = pd.DataFrame(battles.groupby("attacker_king").size().sort_values())
attack = attack.rename(columns = {0:'Battle'})
attack.plot(kind='bar')

<matplotlib.axes._subplots.AxesSubplot at 0xbda3dd8>

<Figure size 1080x720 with 0 Axes>

alt

If you have watched the series you will not be surprised by this. I am still surprised that Robb Stark had that many battles that he started

ml.figure(figsize = (15,10))
attack = pd.DataFrame(battles.groupby("defender_king").size().sort_values())
attack = attack.rename(columns = {0:'Defenece'})
attack.plot(kind='bar')

<matplotlib.axes._subplots.AxesSubplot at 0xc75b550>

<Figure size 1080x720 with 0 Axes>

alt

The Baratheon’s seemed to be able to give as good as they get. Again Surprised Robb had to defend himself so many times considering he was so close to the North.

We will now look at the various win loss ratios.

ml.figure(figsize = (15,10))
sn.countplot(x='attacker_king', hue = 'attacker_outcome', data = battles)

<matplotlib.axes._subplots.AxesSubplot at 0xccc4400>

alt

Everyone had decent win ratios bar Stannis Baratheon.

ml.figure(figsize = (15,10))
sn.countplot(x='attacker_outcome', hue= 'battle_type', data = battles)
ml.legend(bbox_to_anchor=(1, 1), loc=2)

<matplotlib.legend.Legend at 0x10dfd780>

alt

Seems like sieges and razings had a best success ratios, 100%, whereas pitched battles were nearly 50/50.

ml.figure(figsize = (15,10))
sn.countplot(x='year', data = battles)

<matplotlib.axes._subplots.AxesSubplot at 0x10bc2128>

alt

The year 299 had the battles. Why? I do not know.

Share on

Twitter Facebook Google+ LinkedIn

Farai Mathemera

Data Analysis: Game of Thrones (Fun)

Game of Thrones

Basic Statistical Analysis of the data sets.

Battles

Share on

You May Also Enjoy

Web Development Bootcamp: Zaio

Data Analysis: World Happiness Report (Fun)

Machine Learning Project: Digit Recogniser (Kaggle)

Machine Learning Project: Titanic (Kaggle)