There’s a common adage that data scientists spend 90% of their time cleaning data and 10% modeling. With image classifiers, it is more like 99% cleaning to 1% modeling. This is because a neural network needs images to be a standardized size. How many pictures do you come across on a google image search that are all the same size? There are a bevy of different approaches for standardizing images and it is important to remember that no method is necessarily better or worse than another. Each one has its own drawbacks and applications. Oftentimes your ultimate limiter will be computer power. I will go through some here, provide examples and talk about benefits drawbacks, and use cases for each one.
In Fastai which is an API for PyTorch you can pick the way you want to transform your image by specifying within your data block item_tfms. You can certainly take these ideas and use them in whatever format or API you prefer but I, unfortunately, do not have the code here and some may require defining your own classes and methods from scratch.
Block = DataBlock(blocks=(ImageBlock, CategoryBlock),
item_tfms= Pick your transformation here)
Once you make your DataBlock and put it into a data loader, you can use this code to change item_tfms to easily experiment with different methods. Kind of like pandas you need to set the variable name equal to the updated name:
Block = Block.new(item_tfms=Resize(128, ResizeMethod.Squish))
It's been a while since I shoehorned model tanks into my blog so I will use some images of Tiger ones that I got from scraping an image search.
This method will resize the image and crop it into a square. The part of the image that is cropped will be the width or the height depending on the original image. X denotes the size you want to make your images. This method is simple and maintains a relatively large amount of information. It will lose some parts of the image though and in a systematic way which may make your classifier deficient in identifying certain elements of your data.
Item_tfms = resize(x, ResizeMethod.squish)
This method theoretically maintains all the elements of your data set but warps them instead of cropping them. This gets around the problems of cropping but distorts your data. Training your model on something that looks unrealistic seems like it would cause accuracy problems. I don’t like this method but I am biased because I think it just makes stuff look really ugly. I am sure it has an application, one you might need! For something like medical imaging that involves a high level of precision, it is definitely not optimal.
item_tfms=Resize(128, ResizeMethod.Pad, pad_mode=’zeros’
This method gets around maintains all of the data, and does not fundamentally change your data. Unfortunately, it also builds in a lot of wasted computer power. Those black bars teach your model nothing. Neural nets on large data sets are straight-up expensive, and ideally, all those resources should go towards making a better model.
Random Resized Crop
This one seemed really stupid to me when I first heard of it, but it's actually the one the fastai course recommends. Random resized crop randomly takes part of the image and crops a square from it. Min scale represents how much of the image you want at minimum, so here it would be 30%. On the face of it this feels like a bad idea, what’s the point of a model that learns to classify the corner of your pictures. Think about it more like your eyes and your own ability to recognize patterns of parts of things. A neural network has no built-in vision, so it can use data in a very flexible way to understand images. This method leads to less systematic error and a model that is more flexible at identifying images less like the ones it was trained on. That being said if your subject does not make up most of the frame this method may not be great. It is okay to have part of your subject and learn that pattern but if it misses it entirely often this method will not work, increasing the minimum scale can help too.
Another application of these methods is to create additional images for your classifier by squishing skewing or different randomize crops. Showing your subject in slightly different ways will lead to a neural net more prepared for all the randomness that may get thrown at it. I had never heard of data augmentation until recently but if your working with a small but imprecise data set it can lead to a more accurate model.
S**t in S**t out
All these methods have their own unique applications and one big thing in common; they cannot overcome a bad data set. These methods are tools to help you use different combinations of images but making your images all the same size does not make them valuable. Ultimately they are a vital part of training an image classifier, especially with data scraped from the web but not capable of overcoming a poor dataset.