Translate to multiple languages

Tuesday, July 03, 2018

Deep learning in five and a half minutes | blog - Videantis

For decades, algorithms engineers have been trying to make computers “see” as well as we do, as Videantis reports.

Photo: Videantis

That’s no small feat: though today’s smartphone cameras provide about the same high-resolution image sensing ability as the human eye—seven megapixels or so—the computer that processes that data is nowhere near a match for the human brain. Consider that roughly half the neurons in the human cortex are devoted to visual processing, and it’s no surprise it’s a pretty hard task for a computer too.

Algorithms engineers have been trying to make computer vision perform as well as our brain for decades, developing increasingly sophisticated algorithms to help machine vision inch its way forward. This process was primarily one of trial and error. In order to make a computer understand a picture, algorithms engineers tried to figure out what kinds of features to look for in the images. Should it look for colors, edges, points, gradients, histograms, or even complex combinations of those? These detected features were then fed into classical machine learning algorithms such as SVM, Adaboost, and random forests to train them. The results were pretty good — but not really good enough.

Then, in 2012, three developments came together to turbocharge computer vision progress.

First, Princeton University released ImageNet in 2010. ImageNet is a vast collection of millions of images, each one labeled by hand to indicate the objects that are pictured and their corresponding categories...

Second, a new kind of algorithm called a convolutional neural network (CNN) had been developed. Compared to traditional computer vision techniques, these “deep” CNNs provide much higher recognition accuracy. But there’s no such thing as a free lunch: CNNs are very much brute force and require lots of compute power and memory.

Third, high-end graphics processors, or GPUs, had evolved far enough to provide ample compute power for the CNN approach. This meant that the millions of images needed to train a CNN could be processed much, much faster than a general-purpose computer could manage at the time—days instead of weeks.

Source: Videantis

If you enjoyed this post, make sure you subscribe to my Email Updates!