Presenting your complex model in a way people can understand.
Do you ever get feedback on a project that just makes you cringe because it seems so damn obvious? That's what happened this past Friday for me. My teacher said we had a good project but the conclusion and business recommendation were not quite there. He suggested putting our fancy new model into business terms. Today I will do that on this blog and hopefully give you guys a road map for how to present the conclusions you reached based on your complicated model to non-data scientists.
First I will introduce you to our project a little bit. My partner and I obtained a dataset from Kaggle about hotel reservations. Each row was a different guest, with information on their reservation. Our target variable was whether or not they ended up canceling their reservation. We ended up using an XG boost model and a Random Forest Classifier. During the presentation, I kind of waffled around and just presented their scores. This did little to explain why the models were so powerful and how they could be used within a business context.
We got into why we chose precision and wanted to avoid false positives but what if we just showed what that looked like in a graph?
For me at least, this is much more intuitive at a glance than the oft heralded confusion matrix, especially when we are trying to understand the nuances of two different models. I also labeled the axes in business terms and not with the brain-numbing false-positive type lingo. Impressing data science friends might be fun but making a graph that you can glance at and understand is way more important for a business pitch in my opinion. We do not need to know what recall, precision or an F1 score means to understand the success and failures of these two models.
I think we can do even better though.
correct_cancel_rate_boost = 9048/29848
incorrect_cancel_rate_boost = 1505/29848correct_cancel_rate_rfc = 3803/29848
incorrect_cancel_rate_rfc = 3/29848rez_in_year = len(df[df.arrival_date_year == 2017])
rez_per_day = rez_in_year/ 365
revenue_per_day = df.adr.mean() * rez_per_dayrez_increase_boost = rez_per_day*correct_cancel_rate_boost
rev_increase_boost = rez_increase_boost * df.adr.mean()
booking_errors_boost = rez_per_day * incorrect_cancel_rate_boostrez_increase_day_rfc = rez_per_day*correct_cancel_rate_rfc
rev_increase_rfc = rez_increase_day_rfc * df.adr.mean()
booking_errors_rfc = rez_per_day * incorrect_cancel_rate_rfc
With some simple arithmetic, we can calculate how much revenue each model will generate per day in overbookings and how many potential booking errors this will generate.
Then we can easily turn it into a table to easily show our findings and incorporate way more analyses. If we knew how much the average rebooking cost the hotel we could quickly subtract that from the revenue increases to get the overall revenue gain from implementing each model.
These are very simple steps for showing the business efficacy of your model but ones that I wish I had been showing in a clear fashion before I even learned how to generate a random forest classifier. Hopefully, they can jog your brain on how to show your results! Do not do all the hard work of data cleaning and training a model then come up short in its application like me. All this took was some conceptualizing of our business problem, what it would mean to use one of our models, and a little arithmetic.
I had one last takeaway from this project: going back and reworking an old project seems like it would suck but as soon as you get over your mental aversion to it, it's a lot of fun. Working on a project is completely different when you are not under the stress of a deadline. Your brain opens up to all sorts of ideas without worrying if they will be productive or “take too long to implement.” I love the feeling of turning in a project and taking my mind off of it, possibly forever. But this little retuning showed me just how much learning you can get out of editing your older work. Furthermore, you have already done all the hard work! All you have left to do is the fun analysis part. It feels like you can follow your intuition and jump around, rather than be stressed by going in a step by step direction towards your goal. Reorienting yourself with data is difficult but I definitely recommend it if you were not happy with your end result or just think of something new.