Data Wrangling, Exploration, Visualization

Quincy Smith (qrs227)

Introduction

For this project, I decided to use data from the NBA in the 2019-2020 and 2020-2021 seasons. Basketball has always been my favorite sport and I played basketball from since I could walk to the end of high school. Basketball is something that I am very passionate about and one day I want to work with sports data, specifically basketball data, for a living.

The NBA data that I found comes from the two seasons that I had previously mentioned. The data contains measurements of every individual player that played at least one regular season game during their respective season. It has common variables such as games, points, assists, and rebounds as well as some more advanced metrics such as field goal percentage, free throw percentage, and effective field goal percentage. All these different measures are used to judge how effective a player is while competing in an NBA game and they can be manipulated to account for different styles of play.

# read your datasets in here, e.g., with read_csv()
library(tidyverse)
library(gt)
library(ggplot2)
library(kableExtra)
s20_21 <- read_csv("IndvStats.csv")
s19_20 <- read_csv("LastYearStats.csv")

Tidying: Reshaping

If your datasets are tidy already, demonstrate that you can reshape data with pivot wider/longer here (e.g., untidy and then retidy). Alternatively, it may be easier to wait until the wrangling section so you can reshape your summary statistics. Note here if you are going to do this.

temp <- s19_20 %>% pivot_wider(names_from = Player, values_from = Age)
temp %>% head(10)
## # A tibble: 10 x 556
##    Pos   Tm        G    GS    MP    FG   FGA `FG%`  `3P` `3PA`  `3P%`  `2P`
##    <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>
##  1 C     OKC      63    63  1680   283   478 0.592     1     3  0.333   282
##  2 PF    MIA      72    72  2417   440   790 0.557     2    14  0.143   438
##  3 C     SAS      53    53  1754   391   793 0.493    61   157  0.389   330
##  4 C     MIA       2     0    13     1     2 0.5       0     0 NA         1
##  5 SG    NOP      47     1   591    98   266 0.368    46   133  0.346    52
##  6 SG    MEM      38     0   718   117   251 0.466    57   141  0.404    60
##  7 C     BRK      70    64  1852   302   465 0.649     0     6  0       302
##  8 PG    NYK      10     0   117    19    44 0.432     5    16  0.313    14
##  9 PF    ORL      18     2   380    25    86 0.291     9    36  0.25     16
## 10 SG    BRK      10     1   107    10    38 0.263     6    29  0.207     4
## # … with 544 more variables: `2PA` <dbl>, `2P%` <dbl>, `eFG%` <dbl>, FT <dbl>,
## #   FTA <dbl>, `FT%` <dbl>, ORB <dbl>, DRB <dbl>, TRB <dbl>, AST <dbl>,
## #   STL <dbl>, BLK <dbl>, TOV <dbl>, PF <dbl>, PTS <dbl>, `Steven Adams` <dbl>,
## #   `Bam Adebayo` <dbl>, `LaMarcus Aldridge` <dbl>, `Kyle Alexander` <dbl>,
## #   `Nickeil Alexander-Walker` <dbl>, `Grayson Allen` <dbl>, `Jarrett
## #   Allen` <dbl>, `Kadeem Allen` <dbl>, `Al-Farouq Aminu` <dbl>, `Justin
## #   Anderson` <dbl>, `Kyle Anderson` <dbl>, `Ryan Anderson` <dbl>, `Giannis
## #   Antetokounmpo` <dbl>, `Kostas Antetokounmpo` <dbl>, `Thanasis
## #   Antetokounmpo` <dbl>, `Carmelo Anthony` <dbl>, `OG Anunoby` <dbl>, `Ryan
## #   Arcidiacono` <dbl>, `Trevor Ariza` <dbl>, `D.J. Augustin` <dbl>, `Deandre
## #   Ayton` <dbl>, `Dwayne Bacon` <dbl>, `Marvin Bagley III` <dbl>, `Lonzo
## #   Ball` <dbl>, `Mo Bamba` <dbl>, `J.J. Barea` <dbl>, `Harrison Barnes` <dbl>,
## #   `RJ Barrett` <dbl>, `Will Barton` <dbl>, `Keita Bates-Diop` <dbl>, `Nicolas
## #   Batum` <dbl>, `Aron Baynes` <dbl>, `Kent Bazemore` <dbl>, `Darius
## #   Bazley` <dbl>, `Bradley Beal` <dbl>, `Malik Beasley` <dbl>, `Marco
## #   Belinelli` <dbl>, `Jordan Bell` <dbl>, `DeAndre' Bembry` <dbl>, `Dragan
## #   Bender` <dbl>, `Davis Bertan` <dbl>, `Patrick Beverley` <dbl>, `Khem
## #   Birch` <dbl>, `Goga Bitadze` <dbl>, `Bismack Biyombo` <dbl>, `Nemanja
## #   Bjelica` <dbl>, `Eric Bledsoe` <dbl>, `Bogdan Bogdanovic` <dbl>, `Bojan
## #   Bogdanovic` <dbl>, `Bol Bol` <dbl>, `Jonah Bolden` <dbl>, `Marques
## #   Bolden` <dbl>, `Jordan Bone` <dbl>, `Isaac Bonga` <dbl>, `Devin
## #   Booker` <dbl>, `Chris Boucher` <dbl>, `Brian Bowen` <dbl>, `Ky
## #   Bowman` <dbl>, `Avery Bradley` <dbl>, `Tony Bradley` <dbl>, `Jarrell
## #   Brantley` <dbl>, `Ignas Brazdeikis` <dbl>, `Corey Brewer` <dbl>, `Mikal
## #   Bridges` <dbl>, `Miles Bridges` <dbl>, `Oshae Brissett` <dbl>, `Ryan
## #   Broekhoff` <dbl>, `Malcolm Brogdon` <dbl>, `Dillon Brooks` <dbl>, `Bruce
## #   Brown` <dbl>, `Charlie Brown` <dbl>, `Jaylen Brown` <dbl>, `Moses
## #   Brown` <dbl>, `Sterling Brown` <dbl>, `Troy Brown Jr.` <dbl>, `Jalen
## #   Brunson` <dbl>, `Thomas Bryant` <dbl>, `Reggie Bullock` <dbl>, `Trey
## #   Burke` <dbl>, `Alec Burks` <dbl>, `Deonte Burton` <dbl>, `Jimmy
## #   Butler` <dbl>, `Bruno Caboclo` <dbl>, `Devontae Cacok` <dbl>, `Kentavious
## #   Caldwell-Pope` <dbl>, …
temp <- temp %>% pivot_longer(60:489, names_to = "Player", values_to = "Age") %>% 
    filter(is.na(Age) == F)
temp %>% head(10)
## # A tibble: 10 x 128
##    Pos   Tm        G    GS    MP    FG   FGA `FG%`  `3P` `3PA` `3P%`  `2P` `2PA`
##    <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 SF    SAC      68    21  1688   200   534 0.375    84   244 0.344   116   290
##  2 PF    OKC      61     9  1130   125   317 0.394    49   141 0.348    76   176
##  3 SG    WAS      57    57  2053   593  1303 0.455   170   481 0.353   423   822
##  4 SG    MIN      55    14  1209   227   534 0.425   107   276 0.388   120   258
##  5 SG    SAS      57     0   883   123   314 0.392    67   178 0.376    56   136
##  6 C     MEM      29     0   256    35    67 0.522     4    12 0.333    31    55
##  7 SG    ATL      43     4   915   104   228 0.456    15    65 0.231    89   163
##  8 PF    GSW      16     3   286    41    92 0.446    15    43 0.349    26    49
##  9 PF    WAS      54     4  1583   265   610 0.434   200   472 0.424    65   138
## 10 PG    LAC      51    50  1342   147   341 0.431    80   206 0.388    67   135
## # … with 115 more variables: `2P%` <dbl>, `eFG%` <dbl>, FT <dbl>, FTA <dbl>,
## #   `FT%` <dbl>, ORB <dbl>, DRB <dbl>, TRB <dbl>, AST <dbl>, STL <dbl>,
## #   BLK <dbl>, TOV <dbl>, PF <dbl>, PTS <dbl>, `Steven Adams` <dbl>, `Bam
## #   Adebayo` <dbl>, `LaMarcus Aldridge` <dbl>, `Kyle Alexander` <dbl>, `Nickeil
## #   Alexander-Walker` <dbl>, `Grayson Allen` <dbl>, `Jarrett Allen` <dbl>,
## #   `Kadeem Allen` <dbl>, `Al-Farouq Aminu` <dbl>, `Justin Anderson` <dbl>,
## #   `Kyle Anderson` <dbl>, `Ryan Anderson` <dbl>, `Giannis
## #   Antetokounmpo` <dbl>, `Kostas Antetokounmpo` <dbl>, `Thanasis
## #   Antetokounmpo` <dbl>, `Carmelo Anthony` <dbl>, `OG Anunoby` <dbl>, `Ryan
## #   Arcidiacono` <dbl>, `Trevor Ariza` <dbl>, `D.J. Augustin` <dbl>, `Deandre
## #   Ayton` <dbl>, `Dwayne Bacon` <dbl>, `Marvin Bagley III` <dbl>, `Lonzo
## #   Ball` <dbl>, `Mo Bamba` <dbl>, `J.J. Barea` <dbl>, `Harrison Barnes` <dbl>,
## #   `RJ Barrett` <dbl>, `Will Barton` <dbl>, `Keita Bates-Diop` <dbl>, `Nicolas
## #   Batum` <dbl>, `Aron Baynes` <dbl>, `Caleb Swanigan` <dbl>, `Jayson
## #   Tatum` <dbl>, `Jeff Teague` <dbl>, `Garrett Temple` <dbl>, `Daniel
## #   Theis` <dbl>, `Isaiah Thomas` <dbl>, `Khyri Thomas` <dbl>, `Lance
## #   Thomas` <dbl>, `Matt Thomas` <dbl>, `Tristan Thompson` <dbl>, `Sindarius
## #   Thornwell` <dbl>, `Matisse Thybulle` <dbl>, `Anthony Tolliver` <dbl>, `Juan
## #   Toscano-Anderson` <dbl>, `Karl-Anthony Towns` <dbl>, `Gary Trent
## #   Jr.` <dbl>, `Allonzo Trier` <dbl>, `P.J. Tucker` <dbl>, `Rayjon
## #   Tucker` <dbl>, `Evan Turner` <dbl>, `Myles Turner` <dbl>, `Jarrod
## #   Uthoff` <dbl>, `Jonas Valanciunas` <dbl>, `Denzel Valentine` <dbl>, `Jarred
## #   Vanderbilt` <dbl>, `Fred VanVleet` <dbl>, `Gabe Vincent` <dbl>, `Noah
## #   Vonleh` <dbl>, `Nikola Vucevic` <dbl>, `Dean Wade` <dbl>, `Moritz
## #   Wagner` <dbl>, `Dion Waiters` <dbl>, `Kemba Walker` <dbl>, `Lonnie
## #   Walker` <dbl>, `Tyrone Wallace` <dbl>, `Derrick Walton` <dbl>, `Brad
## #   Wanamaker` <dbl>, `T.J. Warren` <dbl>, `P.J. Washington` <dbl>, `Yuta
## #   Watanabe` <dbl>, `Tremont Waters` <dbl>, `Paul Watson` <dbl>, `Quinndary
## #   Weatherspoon` <dbl>, `Russell Westbrook` <dbl>, `Coby White` <dbl>,
## #   `Derrick White` <dbl>, `Hassan Whiteside` <dbl>, `Andrew Wiggins` <dbl>,
## #   `Grant Williams` <dbl>, `Johnathan Williams` <dbl>, `Kenrich
## #   Williams` <dbl>, `Lou Williams` <dbl>, `Marvin Williams` <dbl>, `Robert
## #   Williams` <dbl>, …

Since the data was already tidy, this is a demonstration of using the pivot_longer and pivot_wider functions. In the first step, the data is pivoted wider based on player name and player age. This created a column for every player with their age was its respective position. Then the data was pivoted longer about the columns that had player names and stored the values under the age variable. Yet, because of the pivot wider, every player had “NA” when the age was not in the respective spot so the NAs had to be filtered out.

Joining/Merging

s19_21 <- inner_join(s19_20, s20_21, by = "Player")

names(s19_21) <- names(s19_21) %>% str_replace_all(".x", ".19_20") %>% 
    str_replace_all("[^%[:^punct:]]y", ".20_21")

s19_21 <- s19_21 %>% rename(TEPer.20_21 = "3P%.20_21", TEPer.19_20 = "3P%.19_20", 
    TWPer.20_21 = "2P%.20_21", TWPer.19_20 = "2P%.19_20", TEA.20_21 = "3PA.20_21", 
    TEA.19_20 = "3PA.19_20", TE.20_21 = "3P.20_21", TE.19_20 = "3P.19_20", 
    TWA.20_21 = "2PA.20_21", TWA.19_20 = "2PA.19_20", TW.20_21 = "2P.20_21", 
    TW.19_20 = "2P.19_20")

s19_21 %>% n_distinct("Player")
## [1] 430
s19_20 %>% n_distinct("Player")
## [1] 529
s20_21 %>% n_distinct("Player")
## [1] 540
s19_20 %>% summarize(count = n())
## # A tibble: 1 x 1
##   count
##   <int>
## 1   529
s20_21 %>% summarize(count = n())
## # A tibble: 1 x 1
##   count
##   <int>
## 1   540
s19_20 %>% anti_join(s20_21, by = "Player") %>% n_distinct("Player")
## [1] 99
s20_21 %>% anti_join(s19_20, by = "Player") %>% n_distinct("Player")
## [1] 110
s19_21 <- s19_21 %>% na_if(0)

When joining the two dataframes, it was best to join by Player name as since it is considered the “ID” variable. Inner_join was used to make sure that each player in the joined dataframe appeared in at least one game of the NBA season. This resulted in in a joined dataframe of 430 players,losing 99 plauyers from the 2019-2020 seaon and 110 players from the 2020-2021 season. This may overestimate the statistics of the NBA since players that were not carried over are rookies, retirees, or other players that would be considered of low skill. Most variables had to be renamed in order to make a distinction between the two seasons and follow the R syntax.

Wrangling

# per game function
per_game <- function(x, games) {
    round(x/games, digits = 1)
}
s19_21 %>% mutate(PPG.19_20 = per_game(PTS.19_20, G.19_20)) %>% 
    mutate(PPG.20_21 = per_game(PTS.20_21, G.20_21)) %>% select(Player, 
    PPG.19_20, PPG.20_21) %>% head(10)
## # A tibble: 10 x 3
##    Player                   PPG.19_20 PPG.20_21
##    <chr>                        <dbl>     <dbl>
##  1 Steven Adams                  10.9       7.6
##  2 Bam Adebayo                   15.9      18.7
##  3 LaMarcus Aldridge             18.9      13.5
##  4 Nickeil Alexander-Walker       5.7      11  
##  5 Grayson Allen                  8.7      10.6
##  6 Jarrett Allen                 11.1      12.8
##  7 Al-Farouq Aminu                4.3       4.4
##  8 Kyle Anderson                  5.8      12.4
##  9 Giannis Antetokounmpo         29.5      28.1
## 10 Kostas Antetokounmpo           1.4       0.8
# adding points per game to the dataframe
s19_21 <- s19_21 %>% mutate(PPG.19_20 = per_game(PTS.19_20, G.19_20)) %>% 
    mutate(PPG.20_21 = per_game(PTS.20_21, G.20_21))

# Summarizing points per game, rebounds per game, and steals
# per game for the LA teams
s19_21 %>% filter(str_detect(Tm.19_20, "LA.") | str_detect(Tm.20_21, 
    "LA.")) %>% group_by(Tm.20_21) %>% summarize(Tm_PPG = round((sum(PTS.20_21) + 
    sum(PTS.19_20))/144, 1), Tm_TRB = round((sum(TRB.20_21) + 
    sum(TRB.19_20))/144, 1), Tm_STL = round((sum(STL.20_21) + 
    sum(STL.19_20))/144, 1), count = n()) %>% filter(str_detect(Tm.20_21, 
    "LA."))
## # A tibble: 2 x 5
##   Tm.20_21 Tm_PPG Tm_TRB Tm_STL count
##   <chr>     <dbl>  <dbl>  <dbl> <int>
## 1 LAC        104.   41.3    6.3    14
## 2 LAL        135.   53.7   NA      17
# calculating the proportion of games played by players who
# are 25 years old and younger
s19_21 %>% filter(Age.20_21 < 26 & Age.19_20 <= 25) %>% group_by(Player, 
    Age.20_21) %>% summarize(prop_G = round(sum(G.19_20 + G.20_21)/145, 
    2)) %>% arrange(desc(prop_G)) %>% head(6)
## # A tibble: 6 x 3
## # Groups:   Player [6]
##   Player        Age.20_21 prop_G
##   <chr>             <dbl>  <dbl>
## 1 Mikal Bridges        24   1   
## 2 Nikola Jokic         25   1   
## 3 Ivica Zubac          23   0.99
## 4 Dillon Brooks        25   0.97
## 5 Bam Adebayo          23   0.94
## 6 Devin Booker         24   0.94
# Finding the average proportion of games played by given age
# group for players 25 and younger
s19_21 %>% filter(Age.20_21 < 26 & Age.19_20 <= 25) %>% group_by(Player, 
    Age.20_21) %>% summarize(prop_G = sum(G.19_20 + G.20_21)/145) %>% 
    group_by(Age.20_21) %>% summarize(avg_prop_G = round(mean(prop_G), 
    3), sd_prop_G = round(sd(prop_G), 3), count = n()) %>% gt() %>% 
    tab_header(title = "Proportion of Games Played", subtitle = "For players under 25")
Proportion of Games Played
For players under 25
Age.20_21 avg_prop_G sd_prop_G count
20 0.603 0.276 11
21 0.556 0.227 28
22 0.633 0.240 37
23 0.504 0.262 48
24 0.585 0.269 42
25 0.624 0.231 42
east_teams <- c("BRK", "MIA", "CHO", "NYK", "ORL", "MIL", "TOR", 
    "CHI", "WAS", "PHI", "BOS", "ATL", "CLE", "IND", "DET")

# adding conference variable to the dataframe
s19_21 <- s19_21 %>% mutate(Conf.19_20 = Tm.19_20 %in% east_teams) %>% 
    mutate(Conf.20_21 = Tm.20_21 %in% east_teams) %>% mutate(Conf.19_20 = as.character(Conf.19_20)) %>% 
    mutate(Conf.20_21 = as.character(Conf.20_21)) %>% mutate(Conf.19_20 = str_replace_all(Conf.19_20, 
    "FALSE", "WEST"), Conf.19_20 = str_replace_all(Conf.19_20, 
    "TRUE", "EAST")) %>% mutate(Conf.20_21 = str_replace_all(Conf.20_21, 
    "FALSE", "WEST"), Conf.20_21 = str_replace_all(Conf.20_21, 
    "TRUE", "EAST"))

# finding the average assists per game by given position in a
# respective conference.
s19_21 %>% filter(Conf.19_20 == Conf.20_21, Pos.19_20 == Pos.20_21) %>% 
    group_by(Conf.20_21, Pos.20_21) %>% summarize(APG.19_20 = per_game(AST.19_20, 
    G.19_20), APG.20_21 = per_game(AST.20_21, G.20_21)) %>% summarize(pos_APG.19_20 = round(median(APG.19_20, 
    na.rm = T), 1), pos_APG.20_21 = round(median(APG.20_21, na.rm = T), 
    1), count = n()) %>% arrange(desc(pos_APG.19_20))
## # A tibble: 10 x 5
## # Groups:   Conf.20_21 [2]
##    Conf.20_21 Pos.20_21 pos_APG.19_20 pos_APG.20_21 count
##    <chr>      <chr>             <dbl>         <dbl> <int>
##  1 WEST       PG                  4.4           4.8    25
##  2 EAST       PG                  3.4           3      33
##  3 EAST       SG                  1.9           2.1    33
##  4 EAST       PF                  1.6           1.3    28
##  5 EAST       SF                  1.6           1.3    27
##  6 WEST       SG                  1.6           1.8    30
##  7 WEST       SF                  1.4           1.8    16
##  8 WEST       C                   1.2           1.1    30
##  9 WEST       PF                  1.2           1.2    33
## 10 EAST       C                   0.9           1      30
# Find the average PPG of players with above average steals
s19_21 %>% filter(STL.20_21 > median(STL.20_21, na.rm = T) & 
    STL.19_20 > median(STL.19_20, na.rm = T)) %>% summarize(avg_PPG.19_20 = round(mean(PPG.19_20), 
    1), avg_PPG.20_21 = round(mean(PPG.20_21), 1), count = n())
## # A tibble: 1 x 3
##   avg_PPG.19_20 avg_PPG.20_21 count
##           <dbl>         <dbl> <int>
## 1          14.5          14.5   154
s19_21 %>% summarize(avg_PPG.19_20 = round(mean(PPG.19_20, na.rm = T), 
    1), avg_PPG.20_21 = round(mean(PPG.20_21, na.rm = T), 1), 
    )
## # A tibble: 1 x 2
##   avg_PPG.19_20 avg_PPG.20_21
##           <dbl>         <dbl>
## 1           9.8           9.8

During wrangling, there were a few additions to the dataframe that were made to make wrangling just a bit easier. Variables for points per game and conference were added for each of the respective seasons in order to better understand the data. The “per game” function was written in order to compensate for games that players may have missed due to injuries, coach’s decisions, or personal reasons. This function allowed for the accurate calculation of averages for each player given that they played different games. The first interesting finding when wrangling the data was the much greater Western Conference players were at sharing the basketball. When breaking down the assists per game for each position and dividing them by conference, the position groups in the Western conference consistently had more assist/game than their counter parts in the Eastern conference. Not only were they ahead more often, but they also had larger gaps, for example Western PGs were +1.2 assist/game over Eastern PGs while and Eastern PFs were only +0.6 assists/game over Western PFs. This speaks on the different skill levels of the two conferences as the Western players are more willing to give the ball to their teammates which displays trust in their teammates’ skill and ability to score.

The second interesting find was the proportion of games played by younger guys in the league. When looking at the “younger” players, the data was filtered to players who were less than 26 years old in both seasons. Surprisingly, the youngest age group, 20 years old, did not play the least proportion of games but they did have the greatest variability in games played. The proportion of games played may be attributed to the attempted development of young players while the variability can be attributed to the skill level of said player. Yet, it was also intriguing how the 23 year olds seemed to play the least out of any age group. The 23 year old mark is often the age when players sign their second contract, so it could be possible that many of these players had to become acclimated to a new basketball system which meant they were forced to sit out of games via coach’s decision.

Visualizing

plot1 <- ggplot(data = s19_21, aes(x = STL.20_21, y = TEPer.20_21)) + 
    geom_point(aes(color = Pos.20_21)) + geom_smooth(method = "lm", 
    color = "black") + scale_x_continuous(name = "Total Steals", 
    n.breaks = 10) + scale_y_continuous(name = "3 Point Percentage", 
    n.breaks = 5) + ggtitle("Total Steals vs. 3PT Percentage") + 
    scale_color_discrete("Position") + theme(panel.background = element_rect("gray"))
plot1

This plot was to help visualize if their is a real correlation between 3PT shooting and defensive abilities. In the realm of the NBA, “3 and D” players are considered valuable since the have such a large impact on both sides of the ball. But when looking at the graph, the trend line does not increase very strongly as most of the points are clustered around a similar 3PT percentage. This suggests that these two variables are pretty unrelated and that “3 and D” players could be coming a thing of the past with most of the league being able to shoot threes. Further more, the points are broken down by position in order to help single out the outliers. Almost all of the outliers (those who shot either 100% or 0%) are NBA bigmen so it is very likely they took a very small amount of 3PT shots. Thus their large impact on the plot should be ignored as they do not have a large impact in the 3 point shooting category.

plot2 <- ggplot(data = s19_21) + geom_histogram(aes(x = PTS.20_21, 
    y = ..density..), fill = "purple", color = "black") + geom_density(aes(x = PTS.19_20), 
    color = "red") + scale_x_continuous(name = "Total Points", 
    breaks = seq(0, 2500, 250)) + ggtitle("Point Distribution in the NBA Regular Season") + 
    scale_y_continuous(n.breaks = 7, name = "Density") + theme(panel.background = element_rect("gray"), 
    plot.background = element_rect("gray"))

plot2

This plot demonstrates the point distribution among NBA players from both seasons. From this plot, it appears that from 2019-2020 to 2020-2021, the point distribution remaind the same as the density curve follows the skew of the histogram. With that being said, there still is a small amount more of NBA players that scored less than 125 points in the 2020-2021 season than there was in the 2019-2020 season. Yet, this was complimented by a few players in the 2020-2021 season scoring more than players in the 2019-2020 season. This may suggest that the difference between super star scorers and average NBA players is starting to increase which would increase the value of those players that specialize in scoring the basketball.

plot3 <- ggplot(data = s19_21, aes(x = Tm.20_21, y = TRB.20_21)) + 
    geom_bar(stat = "summary", fun = mean, aes(fill = Tm.20_21)) + 
    geom_errorbar(stat = "summary", fun.data = mean_cl_normal) + 
    scale_y_continuous(name = "Mean Rebounds per Player") + theme(axis.text.x = element_blank()) + 
    scale_x_discrete(name = "Team") + scale_color_discrete("Team") + 
    ggtitle("Average Rebound per Player")

plot3

When looking at the graph and taking into account the results of the NBA seasons, it is clear that the teams that rebound better often preform better. For example, the 2020-2021 Champion Milwaukee Bucks have the highest average rebounds per player in the league. Their great success can partially be attributed to this high rebound rate. The standard error bars provide reasoning to the exception of this trend as the second place Phoenix Suns were nearly last in rebounding, yet the made it all the way to the NBA finals. However, the importance of rebounding must have played a factor as the fell to the Bucks. Furthermore, a team like the Atlanta Hawks sat in mediocrity in rebounding numbers, but their rebounding numbers from individual players allowed them to overcome some adversity, yet they still ultimately fell.

Concluding Remarks

If any!