GOT Season in Seconds

Rakshmitha
8 min readOct 10, 2020

--

Being one of the seemingly few people who hadn’t already started watching Game of Thrones with the series nearing the finale, I become curious about what types of things I could learn about the series quickly using NLP tools.

Being one of the seemingly few people who hadn’t already started watching Game of Thrones with the series nearing the finale,

So let’s start! To checkout the whole code or to get the dataset I have used, kindly checkout my kaggle account: https://www.kaggle.com/rakshmithamadhevan/notebook34892fc586

import numpy as np
import pandas as pd

# Dataset from Kaggle
# source https://www.kaggle.com/gunnvant/game-of-thrones-srt
# Json to CSV transform completed with Trifacta

data = pd.read_csv('../input/got-scripts/Scripts.csv')
data = pd.DataFrame(data)
data.iloc[:1]
def episodeLine(num, line, df=data):
#num = episode
x = df.iloc[num, line]
return x
episodeLine(0, 5, df=data)

Output
“I’ve never seen wildlings do a thing like this.”

def episodeTokens(num, df=data):
'''Input - row from csv. Subtitles for each episode

Output - Token list by word (tokens), and by line (lineList)
'''

lineList = []
tokens = []
wordline = ''


#while isinstance(wordline, str):
for x in range(1, 776):

wordline = episodeLine(num, x, df=data)

if isinstance(wordline, str):

lineList.append(wordline)


for y in lineList:

bite = str(y).split()

tokens = tokens + bite

return tokens, lineList
tokenList, episodeList = episodeTokens(0, df=data)episodeList[:5]

Output
[‘Easy, boy.’,
“What do you expect? They’re savages.”,
‘One lot steals a goat from another lot,’,
“before you know it they’re ripping each other to pieces.”,
“I’ve never seen wildlings do a thing like this.”]

tokenList[:35]

Output
[‘Easy,’,
‘boy.’,
‘What’,
‘do’,
‘you’,
‘expect?’,
“They’re”,
‘savages.’,
‘One’,
‘lot’,
‘steals’,
‘a’,
‘goat’,
‘from’,
‘another’,
‘lot,’,
‘before’,
‘you’,
‘know’,
‘it’,
“they’re”,
‘ripping’,
‘each’,
‘other’,
‘to’,
‘pieces.’,
“I’ve”,
‘never’,
‘seen’,
‘wildlings’,
‘do’,
‘a’,
‘thing’,
‘like’,
‘this.’]

def TopList(num):

'''Input - num - integer, list of tokens
Ouput count of the the most popular terms by frequency '''

vector = CountVectorizer(tokenList, stop_words = 'english')

word_matrix = vector.fit_transform(tokenList).toarray()

frequency = pd.DataFrame(word_matrix, columns = vector.get_feature_names())

total = frequency.sum(axis = 0).sort_values(ascending = False)

return total[:num], total.index[:num]
topWordcount, topWord = TopList(25)
TopList(5)

Output
(king 27
know 20
don 19
father 17
did 14
dtype: int64, Index([‘king’, ‘know’, ‘don’, ‘father’, ‘did’], dtype=’object’))

# Spacy Natural Language Processing package 
#small model
import spacy
nlp =spacy.load('en_core_web_sm')
# accounts for non-string terms in list of subtitles

seasonText = []
for x in episodeList:
seasonText.append(str(x))

# Creates a string of subtitles for each episode
episodeText = ','.join(seasonText)
episodeText[:1000]

Output
“Easy, boy.,What do you expect? They’re savages.,One lot steals a goat from another lot,,before you know it they’re ripping each other to pieces.,I’ve never seen wildlings do a thing like this.,I never seen a thing like this, not ever in my life.,How close did you get?,Close as any man would. — we should head back to the Wall.,Do the dead frighten you?,Our orders were to track the wildlings.,We tracked them. They won’t trouble us no more.,You don’t think he’ll ask us how they died?,Get back on your horse.,Whatever did it to them could do it to us.,They even killed the children.,It’s a good thing we’re not children.,You want to run away south, run away.,Of course, they will behead you as a deserter.,If I don’t catch you first.,Get back on your horse.,I won’t say it again.,Your dead men seem to have moved camp.,They were here.,See where they went.,What is it?,It’s…,Go on, Father’s watching.,And your mother.,Fine work, as always. Well done.,Thank you.,I love the detail that you’ve manage”

# textacy is a Python library for performing higher-level natural language processing (NLP) tasks,  #built on the high-performance Spacy library 
import textacy
import textacy.keyterms
import textacy.extract

# preprocess text
cleanText = textacy.preprocess.remove_punct(episodeText)
cleanText[:1000]

Output
‘Easy boy What do you expect They re savages One lot steals a goat from another lot before you know it they re ripping each other to pieces I ve never seen wildlings do a thing like this I never seen a thing like this not ever in my life How close did you get Close as any man would we should head back to the Wall Do the dead frighten you Our orders were to track the wildlings We tracked them They won t trouble us no more You don t think he ll ask us how they died Get back on your horse Whatever did it to them could do it to us They even killed the children It s a good thing we re not children You want to run away south run away Of course they will behead you as a deserter If I don t catch you first Get back on your horse I won t say it again Your dead men seem to have moved camp They were here See where they went What is it It s Go on Father s watching And your mother Fine work as always Well done Thank you I love the detail that you ve manage’

normalizedText = textacy.preprocess.normalize_whitespace(cleanText)normalizedText[:1000]

Output
‘Easy boy What do you expect They re savages One lot steals a goat from another lot before you know it they re ripping each other to pieces I ve never seen wildlings do a thing like this I never seen a thing like this not ever in my life How close did you get Close as any man would we should head back to the Wall Do the dead frighten you Our orders were to track the wildlings We tracked them They won t trouble us no more You don t think he ll ask us how they died Get back on your horse Whatever did it to them could do it to us They even killed the children It s a good thing we re not children You want to run away south run away Of course they will behead you as a deserter If I don t catch you first Get back on your horse I won t say it again Your dead men seem to have moved camp They were here See where they went What is it It s Go on Father s watching And your mother Fine work as always Well done Thank you I love the detail that you ve managed to get in these corners Quite beautiful Th’

# Create SpaCy Text object 
SpacyTextObject = nlp(normalizedText)
for entity in SpacyTextObject.ents:
if str(entity).lower() in topWord:
print(entity.text, entity.label_)

Output
Stark PERSON
Don PERSON
Stark PERSON
Tell PERSON
Tell ORG
Ned PERSON
Ned PERSON
Don PERSON
Don PERSON
Ned PERSON
Ned PERSON
Grace PERSON
Ned PERSON
Stark PERSON

for entity in SpacyTextObject.noun_chunks:
if str(entity).lower() in topWord:
print(entity.text, entity.label_)

Output
Father NP
Don NP
King NP
Lord NP
Father NP
ll NP
Ned NP
Ned NP
brother NP
Father NP
Come NP
Ned NP
Ned NP
king NP
Ned NP
brother NP

# Extract key terms from a document using the [SGRank] algorithm.
key = textacy.keyterms.sgrank(SpacyTextObject, ngrams=(1, 2, 3, 4, 5, 6), normalize='lemma', window_width=1500, n_keyterms=10, idf=None)
key

Output
[(‘white walker’, 0.11249332592840376),
(‘Lord Stark’, 0.0736658815233693),
(‘Don t’, 0.07046536778427104),
(‘Jon Arryn’, 0.05019053921508314),
(‘Seven Kingdoms’, 0.04223668459048843),
(‘man’, 0.03166701369654801),
(‘king’, 0.027455271920707743),
(‘boy’, 0.0273683528405228),
(‘Wall’, 0.02201017893547364),
(‘good’, 0.020991427692128965)]

# Extract an ordered sequence of named entities (PERSON, ORG, LOC, etc.) 
#from a spacy-parsed doc, optionally filtering by entity types and frequencies.
word1 = textacy.extract.named_entities(SpacyTextObject, include_types=None, exclude_types=None, drop_determiners=True, min_freq=4)
for x in word1:
print (x)

Output
Jon Arryn
Jon Arryn
Jon Arryn
Ned
Ned
first
Dothraki
Dothraki
Khal Drogo
Khal Drogo
Khal Drogo
Ned
first
Ned
first
Jon Arryn
Ned
Khal Drogo
Dothraki
Dothraki
Dothraki
first

#Top Phrases by word count
word2 = textacy.extract.ngrams(SpacyTextObject, 3, filter_stops=True, filter_punct=True, filter_nums=False, include_pos=None, exclude_pos=None, min_freq=3)
for y in word2:
print(y)

Output
winter is coming
saw the white
saw the white
Don t look
saw the white
King s Landing
Don t look
King s Landing
don t want
don t want
Winter is coming
Winter is coming
King s Landing
don t want
don t look
s all right
s all right
s all right

text = '''Ned is a fictional character. Ned is the lord of Winterfell, an ancient fortress in the North of the fictional continent of Westeros. 
Though the character is established as the main character in the novel and the first season of the TV adaptation,
Martin's plot twist at the end involving Ned shocked both readers of the book and viewers of the TV series.
Ned is the leader of the Stark Family. Ned is a father of six children.'''
def summary(sentence, matchWord):
summary = ''
sentobj = nlp(sentence)
sentences = textacy.extract.semistructured_statements(sentobj, matchWord, cue = 'be')
for i, x in enumerate(sentences):
subject, verb, fact = x

summary += 'Fact '+str(i+1) +': '+(str(fact))+" "
return summary
summary(text, 'Ned')

Output
‘Fact 1: a fictional character Fact 2: the lord of Winterfell, an ancient fortress in the North of the fictional continent of Westeros. \n Fact 3: the leader of the Stark Family Fact 4: a father of six children ‘

doc = nlp(text) 
verbTriples = textacy.extract.subject_verb_object_triples(doc)
for x in verbTriples:
print(x)

Output
(Ned, is, character)
(Ned, is, lord)
(Ned, is, leader)
(Ned, is, father)

def popularSeriesphrases(num, end, phrase_leng, freq, df=data):
'''num - beginning epidsode
end - ending episode
phrase-leng - length of phrase
freq - min frequency
output - popular phrases by epidsode subset
'''

textComp = ''

for i in range(num, end):
token, line = episodeTokens(i, df=data)

Text = []
for x in line:
Text.append(x)

# Creates a string of subtitles for each episode
Text = ','.join(Text)

textComp = textComp + Text

total = textacy.preprocess.normalize_whitespace(textComp)

spaceObj1 = nlp(total)

spaCy_phrase = textacy.extract.ngrams(spaceObj1, phrase_leng, filter_stops=True, filter_punct=True, filter_nums=False, include_pos=None, exclude_pos=None, min_freq=freq)
phrase = []

for y in spaCy_phrase:
x = str(y).lower()
if x not in phrase:
phrase.append(x)
return phrase

#Season 1 3 word phrases

popularSeriesphrases(1, 10, 3, 3,df=data)

Output
[“king’s landing”,
‘thing you need’,
‘going to die’,
“night’s watch”,
‘want to know’,
‘tried to kill’,
“butcher’s boy”,
“won’t hurt”,
“king’s hand”,
‘live and die’,
“won’t let”,
‘lord of winterfell’,
‘lannister always pays’,
‘know what happened’,
‘stallion who mounts’,
‘sun and stars’]

# Season 1 four word phrases
popularSeriesphrases(1, 10, 4, 3,df=data)

Output
[‘men of the night’,
‘hand of the king’,
‘hand of the king.,-’,
‘north of the wall’,
‘man of the night’,
‘king in the north!,the’,
‘king in the north’]

# season 2 four word phrases 
popularSeriesphrases(11, 20, 4, 3, df=data)

Output
[‘hand of the king’,
‘commander of the city’,
‘sit on the iron’,
“won’t be able”,
‘day until the end’]

# season 2 five word phrases 
popularSeriesphrases(11, 20, 5, 2, df=data)

Output
[‘swear it by the old’,
‘said no harm would come’,
“know what it’s like”,
‘think that about their fathers’,
“rains weep o’er his halls”,
“rains weep o’er his hall”,
‘serve the starks.,i serve lady’]

Guys, hope this was useful for you. If you like this notebook, then upvote, share with your friends and follow me for more interesting topics!

--

--

Rakshmitha
Rakshmitha

Written by Rakshmitha

ML Enthusiast | Full Stack Engineer | Content Writer

No responses yet