Three Lessons from Scale Modeling that Apply to Data Science
When I tell people building little 6-inch tanks changed my life their eyes usually gloss over. Then when I show them a picture they let out a half-hearted, “Oh, that’s really good!” after a quick glance. Last week I spent some 50 odd hours on my first exploratory data analysis (EDA) at Flatiron’s Data Science Bootcamp. When I showed my family a few of the charts I poured my soul into the response felt all too familiar. It was then that I realized my hobby and my hopeful career path have a lot more in common than one might think. Both involve countless hours trying to make something perfect: whether it’s getting intimate with thousands of data points until your eyes bleed or making thousands of in scale, irregular paint chips… until your eyes bleed. Maybe the fact that I can draw comparisons between two things I love says more about me than data science but here are three lessons that will help any scale modeler, artist, data scientist, or burgeoning coder.
This one is no secret. I’m sure anyone reading this has been told or felt like patience is important. It’s important when you’re combing through your essay for grammar mistakes. It’s important when your code is broken and you have to spend *another* half-hour on stack overflow. It’s important when you’re trying to make each individual speck of mud perfect in between each individual tank tread. Hopefully, I can offer a more nuanced take. I think for many, patience evokes a sort of passivity. That if you just put in the hours it will turn out ok. Patience is not passive, it’s hard f%@!ing work. Patience is doing the same thing for hours and trying as hard as you can. Patience is proving that the one thing you did well, you can do it again and again and again and not lose your focus or precision. That you can take your time at each step of the process. That you troubleshoot with the same clear head at the beginning of the project and when its 11 PM the day before its due. Patience gets harder as you go, not easier, at least not at first. Patience is boring but it’s not boring because it’s easy, it’s boring because for the next four, five, eight, hours, days, weeks you have to give just as much attention as you did when you first started.
2. Storytelling does not have to be in your face
Telling a story is something every analyst, artist, or writer struggles with. One thing I learned from dozens of youtube videos on “visual storytelling” in modeling is that storytelling is not about what people directly perceive the story as, but that there is a story interwoven through your entire work. Storytelling does not have to be in your face. A lot of inspiration for my models comes from muddy equipment I see at construction sites on the side of the road. They’re not quite tanks, but you can see where the treads have been polished to a bright shine, contrasted with the dry crusty mud, the sides of the vehicle where rain has streaked away at the accumulated dust. Where after months, or years of use on construction sites, certain areas of paint are chipped, but not others. No one looks at a hatch with its corners worn down and thinks of an X happened then Y narrative, like Bob clearly set his toolbox here one too many times. It’s more subtle, they sense the story behind all those layers. When a model tells a good story you can feel the battles it’s been in, the distance it’s traveled, and the way its crew used it, day after day.
What story does your data tell? It’s a question you hear all the time in data science. Honestly, if your data tells a simple story, one where you could answer this question in a sentence or two, you have not considered all the possibilities. Just like modeling, storytelling in data analysis is a subtle art. A story is ONE data point about ONE person, not what EVERY person in EVERYone of your data points tells you. A story is what one drop of mud or speck of discolored paint says about your model. Storytelling is what every speck or drop or fleck on your model portrays as a whole. As humans, we crave, one, simple story, but data analysis is not the place for it. Data analysis should be about every story in the data set. Data science is about telling the truth. The boring, the crazy, the cool; the truth is always some sum of multiple parts, and only telling one of those stories would be ignoring the nuance of reality. Probabilities offer infinite outcomes, don’t be afraid to do those possibilities justice and tell a more subtle story.
3. Fix it first
Things go wrong when you build a model with nearly a thousand tiny parts. You break an antenna here or misalign your suspension there or realize the decals you put on are from 1944 and you’re building a late 1943 vehicle. Sometimes when these little problems happen you’re just too frustrated and you move on and finish building the model, hoping the mistake will work itself out or just go the heck away. Ultimately, these little problems just hang over you until the thought of working on your beloved model fills you with dread.
When I started my project last week, I knew what Git was, I thought I kinda mighta sort of knew how it worked, but I certainly did not feel comfortable with the treacherous program. Instead of addressing this knowledge gap, I just started working on other things, fun python data things, but Git hung over me; a specter of pushes and merges and overgrown twisting writhing branches. Then it was Thursday, the day before my group’s project was due and our entire Jupyter notebook was corrupted. I started sweating, I may have even sweated from my eyes a little bit. But you know what, My partner and I fixed it. We buckled down and overcame. Still, that little inkling of a problem caused me so much more stress than it should have. I could have fixed it right when I realized I was not comfortable with our Git workflow, by speaking up, practicing, and learning. And you know what? Then the problem that I dealt with eventually would not have haunted me throughout the entire project. FIX IT FIRST: you will have to anyway, so you might as well save yourself the stress.
Hmmm, my first attempt at a data science blog post or any sort of blog post really, got a lot more philosophical than I was expecting. Feeling a story? Maybe my brain needs a break from all the rationality and math. Or maybe this is me struggling to find my artistic identity in a field that can drown it out. Anyways next time I promise to do something more technical. Unfortunately, I can’t promise some poetic descriptions of bulldozers will be present. For now: be patient, do not be afraid to lean into subtlety, and always fix it first!