Pokemon Stats

In this review I will compare stats from Pokemon in order to get insights of current classification of them in terms of their capabilities for competitive battles.

Libraries

This analysis will be made using R so we need to load some libraries:

library(tidyverse)
library(plotly)
library(factoextra)
library(heatmaply)
library(knitr)
library(caret)

Dataset

I will use a Pokemon dataset available in kaggle.

pstats <- read.csv("../pokemon.csv")

This dataset contains information about battle stats of the Pokemon as follows:

HP Is the health power of the pokemon, it could be thought as the stamina of the pokemon.
Attack The power of physical attacks.
Defence The resistance to physical attacks.
Sp_attack The power of non physical attacks (special / energy attacks).
Sp_defence The resistance to non physical attacks.
Speed The velocity to perform an attack.
Total Is the sum of the other variables but Name.
Name Is the name of the Pokemon.

kable(head(pstats))

Name	Total	HP	Attack	Defence	Sp_attack	Sp_defence	Speed
Bulbasaur	318	45	49	49	65	65	45
Ivysaur	405	60	62	63	80	80	60
Venusaur	525	80	82	83	100	100	80
Mega Venusaur	625	80	100	123	122	120	80
Charmander	309	39	52	43	60	50	65
Charmeleon	405	58	64	58	80	65	80

Exploratory analysis

The first step consists in perform an exploratory analysis of the different variables in this dataset. It is always useful to start identifying whether there is an identificable difference in the distribution of de data:

pstats %>% 
  pivot_longer(.,c(HP,Attack,Defence,Sp_attack,Sp_defence,Speed),names_to = "stat") -> pivstats
pivstats %>%
  ggplot(aes(x=stat,y=value, fill=stat)) +
  geom_boxplot() +
  labs(title = "Distribution of stats") -> p
ggplotly(p)

As you can see, variables are equivalent between them so we can use raw data as it is.

I generate an interactive plot to see ranks of the pokemon across the variable stats:

pivstats %>%
  ggplot(aes(x=stat,y=reorder(value,value),fill=Total,text=Name)) +
  geom_bar(stat="identity", position="dodge") +
  labs(title="Pokes oredered by stat", y="value") -> p
ggplotly(p)

And an obvious plot to see is comparing Total variable to each of the component variables to get insights of visible patterns in data:

pivstats %>%
  ggplot(aes(color=stat,y=value,x=Total,text=Name)) +
  geom_point() +
  labs(title="Plotting stats vs PC") -> p
ggplotly(p)

Principal component analysis

The following thing to review consists in a reduction of dimensions on the data, the idea is to check if the primary variables contribute in some way to de dispersion of the capabilities of the Pokemon in battle.

I decide to use PCA to investigate how the variables relate in this dataset. In the first image are plotted the first and the second principal components, and the third one is displayed as a color scale.

It is clear that PC1 contains the overall summary of the battle capabilities for the Pokemon. There is an spotlight Pokemon: "Mega Eternatus", it is very different from the rest because of their great stats. On the other hand, some great Pokemons such as "Mega Rayquaza", "Mega Groudon", "Mega Kyogre" are the nearest neigbors of the best Pokemon. They are also following the tendency on PC1.

pstats %>% select(HP,Attack,Defence,Sp_attack,Sp_defence,Speed) %>% prcomp() -> pca_poke
pcpoke <- pca_poke$x
pcpoke <- cbind(as.data.frame(pcpoke),nombre=pstats$Name)
pcpoke %>% ggplot(aes(x=PC1,y=PC2, color=PC3, text=nombre)) + geom_point() + labs(title = "Pokes in principal components") -> p
ggplotly(p)

Second image projects PC1, PC2, PC3, and PC4 in a plot. It is very clear that the main outlier corresponds to "Mega Eternatus", however another Pokemon is highlighted (in yellow): "Shuckle" which is the bug with the highest defense on the game (because of its shell).

plot_ly(pcpoke, x= ~PC1, y=  ~PC2, z= ~PC3, color = ~PC4, text= ~nombre)

## No trace type specified:
##   Based on info supplied, a 'scatter3d' trace seems appropriate.
##   Read more about this trace type -> https://plotly.com/r/reference/#scatter3d

## No scatter3d mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode

In that visualization, it can be seen that over PC1, PC2, PC3 is projected a cone filled with pokemon. That could be the main picking space for Nintendo and Gamefreak for the new Pokemon in every generation.

So the following step is to get a better view of how original variables contribute with the \(2\) principal components. In this plot you can see that the first impression about the summary along PC1 could be confirmed as every variable are somehow directed in similar direction. A interesting thing is that for PC2 seems to be a tendency for aggressive stats (Speed, Sp_attack) in positive values, and defensive to the other side.

fviz_pca_var(pca_poke, col.var = "contrib", gradient.cols=c("#00AFBB","#E7B800","#FC4E07"), repel=TRUE) -> p
p

Clustering analysis

In the PCA is shown that there is an area commonly picked to create new Pokemon over all generations. Another feature observed is that there is two outliers, however the rest of Pokemon also could be clustered in different groups. In this section I want to show you a classification that can be done using merely this type of stats.

I will use \(k=6\) for this analysis looking for classes somehow similar to this eschema:

Common Pokemon
Strong Pokemon
Competitive Pokemon
Prohibited Pokemon
Shuckle?
Mega Eternatus?

ptstats<-as.matrix(pstats)
rownames(ptstats)<-pstats$Name
d<-dist(ptstats)

## Warning in dist(ptstats): NAs introducidos por coerción

h<-hclust(d)
fviz_dend(x=h,k=6)

And proyected in a heatmap:

heatmaply(apply(ptstats[,3:8],c(1,2),as.numeric))

## Warning in fix_not_all_unique(rownames(x)): Not all the values are unique -
## manually added prefix numbers

The next thing to see is use tag of classes obtained by hierarchical clustering in the projection of the PCA.

cluspoke<-cutree(h,k=6)
cbind(pcpoke,cluspoke) %>%
  ggplot(aes(x=PC1,y=PC2, color=as.factor(cluspoke), text=nombre)) +
  geom_point() +
  labs(title = "Classes of Pokemon") -> p
ggplotly(p)

And in the plotting relationship between Defence~Attack variables using class tags to deveal hidden patterns (if they exists).

cbind(pstats,cluspoke) %>%
  ggplot(aes(x=Defence,y=Attack, color=as.factor(cluspoke), size=Total, text=Name)) +
  geom_point() +
  labs(title = "Comparing stats on classes of Pokemon") -> p
ggplotly(p)

Classification using Machine Learning

poketype<- read.csv("../pokedex_(Update_05.20).csv", row.names = 1)

pstats %>% inner_join(poketype, by = c("Name" = "name")) %>% select(pokedex_number,Name,HP,Attack,Defence,Sp_attack,Sp_defence,Speed,type_1) -> pstats_wtype

index <- createDataPartition(pstats_wtype$type_1, p=0.65, list=FALSE)
pwtype.training <- pstats_wtype[index,]
pwtype.test <- pstats_wtype[-index,]

model_type_knn <- train(pwtype.training[,3:8], pwtype.training[,9], method="knn", preProcess = c("center","scale"))

## Registered S3 methods overwritten by 'proxy':
##   method               from    
##   print.registry_field registry
##   print.registry_entry registry

predictions <- predict(object=model_type_knn,pwtype.test[,3:8])
table(predictions)

## predictions
##      Bug     Dark   Dragon Electric    Fairy Fighting     Fire   Flying 
##       29        3       11       22        3        6       29        0 
##    Ghost    Grass   Ground      Ice   Normal   Poison  Psychic     Rock 
##        3       27       17        2       51       17       15       13 
##    Steel    Water 
##       13       50

testLabels <- pwtype.test[,9]
confusionMatrix(predictions,as.factor(testLabels))

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Bug Dark Dragon Electric Fairy Fighting Fire Flying Ghost Grass
##   Bug        6    2      0        4     0        0    1      0     0     5
##   Dark       0    0      0        0     0        2    0      0     0     0
##   Dragon     0    1      2        0     0        0    0      0     1     1
##   Electric   1    1      1        2     0        0    2      0     2     3
##   Fairy      1    0      0        0     0        0    0      0     0     0
##   Fighting   0    0      0        0     2        0    0      0     0     0
##   Fire       2    2      2        3     0        2    6      0     1     1
##   Flying     0    0      0        0     0        0    0      0     0     0
##   Ghost      0    0      0        0     0        0    0      0     1     0
##   Grass      3    0      0        0     1        0    4      0     0     5
##   Ground     1    0      0        1     0        4    1      0     0     0
##   Ice        0    0      0        0     0        0    0      0     0     1
##   Normal     2    2      0        2     0        3    2      2     0     4
##   Poison     3    2      1        2     0        0    1      0     0     2
##   Psychic    0    0      0        0     1        0    0      0     1     2
##   Rock       4    0      1        0     0        0    0      0     0     0
##   Steel      1    0      1        0     0        0    0      0     2     2
##   Water      3    3      4        2     3        1    3      0     2     4
##           Reference
## Prediction Ground Ice Normal Poison Psychic Rock Steel Water
##   Bug           0   0      3      0       3    1     1     3
##   Dark          0   0      0      0       0    1     0     0
##   Dragon        0   0      0      0       2    0     1     3
##   Electric      0   1      2      1       2    1     0     3
##   Fairy         0   0      0      0       2    0     0     0
##   Fighting      2   0      1      1       0    0     0     0
##   Fire          0   2      1      0       1    2     0     4
##   Flying        0   0      0      0       0    0     0     0
##   Ghost         0   0      1      1       0    0     0     0
##   Grass         1   0      3      2       1    0     1     6
##   Ground        2   1      0      0       0    4     2     1
##   Ice           0   0      0      0       0    0     0     1
##   Normal        3   1     19      4       0    0     0     7
##   Poison        0   2      2      0       0    0     0     2
##   Psychic       0   0      2      0       4    0     1     4
##   Rock          1   0      1      0       0    3     0     3
##   Steel         1   0      0      1       0    2     2     1
##   Water         1   2      4      2       5    3     3     5
## 
## Overall Statistics
##                                           
##                Accuracy : 0.1833          
##                  95% CI : (0.1419, 0.2308)
##     No Information Rate : 0.1383          
##     P-Value [Acc > NIR] : 0.01576         
##                                           
##                   Kappa : 0.1093          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Bug Class: Dark Class: Dragon Class: Electric
## Sensitivity             0.22222    0.000000      0.166667        0.125000
## Specificity             0.91901    0.989933      0.969900        0.932203
## Pos Pred Value          0.20690    0.000000      0.181818        0.090909
## Neg Pred Value          0.92553    0.957792      0.966667        0.951557
## Prevalence              0.08682    0.041801      0.038585        0.051447
## Detection Rate          0.01929    0.000000      0.006431        0.006431
## Detection Prevalence    0.09325    0.009646      0.035370        0.070740
## Balanced Accuracy       0.57062    0.494966      0.568283        0.528602
##                      Class: Fairy Class: Fighting Class: Fire Class: Flying
## Sensitivity              0.000000         0.00000     0.30000      0.000000
## Specificity              0.990132         0.97993     0.92096      1.000000
## Pos Pred Value           0.000000         0.00000     0.20690           NaN
## Neg Pred Value           0.977273         0.96066     0.95035      0.993569
## Prevalence               0.022508         0.03859     0.06431      0.006431
## Detection Rate           0.000000         0.00000     0.01929      0.000000
## Detection Prevalence     0.009646         0.01929     0.09325      0.000000
## Balanced Accuracy        0.495066         0.48997     0.61048      0.500000
##                      Class: Ghost Class: Grass Class: Ground Class: Ice
## Sensitivity              0.100000      0.16667      0.181818   0.000000
## Specificity              0.993355      0.92171      0.950000   0.993377
## Pos Pred Value           0.333333      0.18519      0.117647   0.000000
## Neg Pred Value           0.970779      0.91197      0.969388   0.970874
## Prevalence               0.032154      0.09646      0.035370   0.028939
## Detection Rate           0.003215      0.01608      0.006431   0.000000
## Detection Prevalence     0.009646      0.08682      0.054662   0.006431
## Balanced Accuracy        0.546678      0.54419      0.565909   0.496689
##                      Class: Normal Class: Poison Class: Psychic Class: Rock
## Sensitivity                0.48718       0.00000        0.20000    0.176471
## Specificity                0.88235       0.94314        0.96220    0.965986
## Pos Pred Value             0.37255       0.00000        0.26667    0.230769
## Neg Pred Value             0.92308       0.95918        0.94595    0.953020
## Prevalence                 0.12540       0.03859        0.06431    0.054662
## Detection Rate             0.06109       0.00000        0.01286    0.009646
## Detection Prevalence       0.16399       0.05466        0.04823    0.041801
## Balanced Accuracy          0.68477       0.47157        0.58110    0.571228
##                      Class: Steel Class: Water
## Sensitivity              0.181818      0.11628
## Specificity              0.963333      0.83209
## Pos Pred Value           0.153846      0.10000
## Neg Pred Value           0.969799      0.85441
## Prevalence               0.035370      0.13826
## Detection Rate           0.006431      0.01608
## Detection Prevalence     0.041801      0.16077
## Balanced Accuracy        0.572576      0.47418

predictions2 <- predict(object = model_type_knn, pstats[,3:8])

cbind(pcpoke,predictions2) %>% ggplot(aes(x=PC1,y=PC2,color=predictions2, text=nombre)) + geom_point() + labs(title="Prediction using K Nearest Neibourghs") -> p
ggplotly(p)

#model_type_dnn <- train(pwtype.training[,3:8], pwtype.training[,9], method="dnn", preProcess = c("center","scale"))
model_type_dnn <- readRDS("model_type_dnn.rds")
predictions <- predict(object=model_type_dnn,pwtype.test[,3:8])
table(predictions)

## predictions
##      Bug     Dark   Dragon Electric    Fairy Fighting     Fire   Flying 
##        0        0        0        0        0        0        0        0 
##    Ghost    Grass   Ground      Ice   Normal   Poison  Psychic     Rock 
##        0        0        0        0        0        0        0        0 
##    Steel    Water 
##        0      311

predictions2 <- predict(object = model_type_dnn, pstats[,3:8])

cbind(pcpoke,predictions2) %>% ggplot(aes(x=PC1,y=PC2,color=predictions2, text=nombre)) + geom_point() + labs(title = "Prediction using Deep Neural Network") -> p
ggplotly(p)

model_type_rf <- train(pwtype.training[,3:8], pwtype.training[,9], method="rf", preProcess = c("center","scale"))

## Warning: model fit failed for Resample06: mtry=2 Error in randomForest.default(x, y, mtry = min(param$mtry, ncol(x)), ...) : 
##   Can't have empty classes in y.

## Warning: model fit failed for Resample06: mtry=4 Error in randomForest.default(x, y, mtry = min(param$mtry, ncol(x)), ...) : 
##   Can't have empty classes in y.

## Warning: model fit failed for Resample06: mtry=6 Error in randomForest.default(x, y, mtry = min(param$mtry, ncol(x)), ...) : 
##   Can't have empty classes in y.

## Warning: model fit failed for Resample22: mtry=2 Error in randomForest.default(x, y, mtry = min(param$mtry, ncol(x)), ...) : 
##   Can't have empty classes in y.

## Warning: model fit failed for Resample22: mtry=4 Error in randomForest.default(x, y, mtry = min(param$mtry, ncol(x)), ...) : 
##   Can't have empty classes in y.

## Warning: model fit failed for Resample22: mtry=6 Error in randomForest.default(x, y, mtry = min(param$mtry, ncol(x)), ...) : 
##   Can't have empty classes in y.

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

predictions <- predict(object=model_type_rf,pwtype.test[,3:8])
table(predictions)

## predictions
##      Bug     Dark   Dragon Electric    Fairy Fighting     Fire   Flying 
##       31        6        8       15        3        9       26        0 
##    Ghost    Grass   Ground      Ice   Normal   Poison  Psychic     Rock 
##        5       26       15        1       65       13       22       11 
##    Steel    Water 
##        8       47

predictions2 <- predict(object = model_type_rf, pstats[,3:8])

cbind(pcpoke,predictions2) %>% ggplot(aes(x=PC1,y=PC2,color=predictions2, text=nombre)) + geom_point() + labs(title = "Predictions using Random Forest") -> p
ggplotly(p)