Running a Basic Statistical Test in Python on a Real World Example

Last week in my data science Bootcamp at Flatiron we took a headlong rush through basic statistics. I learned a ton about probability, null and alternative hypotheses, A-B testing, and linear regression. In my experience, example stats problems can feel contrived like it's hard to care about the concepts. So today I will make my own null hypothesis, do some very simple data collection, and come up with my very own p-value. Hopefully, this can cement some concepts for me and help anyone who is struggling to run or interpret a t-test. I wanted to do more advanced stuff in this blog, but writing this up taught me that forming a simple, testable question in the real world is murky and takes a lot of time and data organization. My number one recommendation from this process is not any fancy python packages but just to allocate as much time for data processing and brainstorming all the possibilities of your inquiry. A whiteboard helps!

The problem: Is scoring really higher in the NFL this year?

People love football because it's exciting. Unlike a TV show or movie you really have no idea what's going to happen. There are no scriptwriters purposely manipulating you. The good guys do not always win! This makes it so much sweeter when your team pulls off a tremendous comeback than when Captain America finally figures out the right guy to punch until the good guys win.

On the other hand, football is a product. There is various evidence the NFL has consistently changed the rules in order to make the game more in line with what the average viewer wants: There are a plethora of penalties that protect quarterbacks as the sport can be unwatchable when they get hurt. Pass interference penalties have increased over the years, leading to softer coverage. The NFL knows we want shootouts and like a writer or a director, they have shifted things in that direction.

Coming into this season, the NFL’s product was under threat. Not only did they seem wholly unprepared for COVID, especially compared with the dramatically better at everything organization, the NBA, but the pandemic may have also shifted peoples viewing habits. This was a year more than any they needed to draw viewers in. After a few weeks, I started to notice that at least subjectively there seemed to be more scoring and fewer penalties this year than in previous seasons. Inspired by some podcasters who noticed similar trends, I came up with a basic alternative hypothesis and set to work using my data science toolbox on a real-life example: Has mean scoring significantly increased this year?

Here are the steps for a hypothesis test:

1) Set up null and alternative hypotheses

2) Choose a significance level

3) Calculate the test statistic

4) Determine the critical or p-value (find the rejection region)

5) Compare t-value with critical t-value to accept or reject the Null hypothesis.

  1. My null hypothesis was that average scoring was no different from last year to this year. My alternative hypothesis was that it had increased. The Null hypothesis is the status quo, while your alternative hypothesis is based on something being different from the null hypothesis based on your observations
  2. I chose a standard significance of 0.05. This is your chance of returning a false positive or rejecting the null hypothesis when it is true. A lower alpha level reduces the risk of this but what I was doing was not especially risky. No one is going to lose money or be affected medically if I’m wrong. So going with the commonly used level of 0.05 seemed like a fine choice.
  3. To calculate the test statistic I will need some data.

The good folks at Pro Football Reference have easily accessible tables for scoring and even tools for selecting the columns you want before you download them! I chose to focus on total points and total yards, from the first six weeks of 2020 and from last year. In the future, I would like to take a more nuanced look at offensive stats but for today's curiosity, these metrics will do. I further narrowed it down to average points per week per team for my testing variable.

My initial inspection showed that the mean scoring per game per team is higher this year by over 3 points at 25.65 than last years 22.18 points per game. But is this significant?

Running a t-test in python is as easy as the two lines above!

4. The t-test above returns my T statistic and P-value for me so no need to mess around with a T-table like in high school stats class. My test returned a p-value of .0172 which is lower than my chosen significance level of 0.05.

5. Based on these results, I reject the null hypothesis: scoring is different this year compared to last. The whole league is cooking compared to last year, not just Russ.

There are tons of further investigations I wanted to get into but this initial finding took me a lot longer than I thought it would. Always allocate more time to collect clean and organize data than you think you will need. It was much trickier coming up with a well defined null hypothesis I could actually test than when the data is given to you with a line of inquiry on a platter during exercise. Holding calls were down 78% in week one which sounds significant. I would like to look at some correlations for penalties to find further evidence that the NFL may be artificially increasing scoring. Or if there is another reason for this significant increase. Ultimately It would be fascinating to see which teams are benefitting from this increased offense and what teams are getting hurt by it.