Data Cleaning for Image Classification

Block = DataBlock(blocks=(ImageBlock, CategoryBlock),      
get_items=get_image_files,
get_y=parent_label,
item_tfms= Pick your transformation here)
Block = Block.new(item_tfms=Resize(128, ResizeMethod.Squish))

Resizing

Here you see how this may systematically restrict information, in every photo, the muzzle of the gun is cut off
item_tfms= resize(x)

Squishing

Eww
Item_tfms = resize(x, ResizeMethod.squish)

Padding

The black bars are added to standardize the image size
item_tfms=Resize(128, ResizeMethod.Pad, pad_mode=’zeros’

Random Resized Crop

One image cropped four different ways as an example
tem_tfms=RandomResizedCrop(128, min_scale=0.3)

Data Augmentation

Another application of these methods is to create additional images for your classifier by squishing skewing or different randomize crops. Showing your subject in slightly different ways will lead to a neural net more prepared for all the randomness that may get thrown at it. I had never heard of data augmentation until recently but if your working with a small but imprecise data set it can lead to a more accurate model.

S**t in S**t out

All these methods have their own unique applications and one big thing in common; they cannot overcome a bad data set. These methods are tools to help you use different combinations of images but making your images all the same size does not make them valuable. Ultimately they are a vital part of training an image classifier, especially with data scraped from the web but not capable of overcoming a poor dataset.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store