Translate into a different language

Friday, July 03, 2015

Visual analytics: Can machine learning ‘see’?

Photo: Gadi Lenz
Here's the piece Gadi Lenz, Chief Scientist at AGT International contributed to IDG Connect. "The human brain remains the best video analyser – but computers are starting to catch up." 

Photo: IDG Connect

Here is an interesting observation: Ask a child to describe what she sees around her and she will immediately tell you something like “I see a tall man talking to a woman in the driveway in front of a yellow house”.

The same task is beyond current computer technology – specifically, feeding a “raw” video clip to a machine and getting back (reasonably quickly) a short textual description of what happens in the clip, is currently pretty much impossible. Images and video are rich sources of information consisting of many different objects (with different shapes and colours) with some relationship to each other, in some environment, possibly moving (in the case of video), etc. – there is a reason that a picture is worth a thousand words.

Analysing images and video to facilitate automatic insights and associated decisions is still incredibly difficult (even offline; doing it in real time is much harder). A further complication is the fact that most of the visual content we view is actually a 2D projection of the real (3D) world. Remarkably, humans are really good at these types of tasks, so one approach could be “Hey, let’s just copy the human visual system (HVS)” – if only it was that simple.

So, what can we do in the area of video analytics or video content analysis? Actually, quite a bit, though not quite as much as you may have seen in some popular movies. Here are some examples:
  • Driven by security and surveillance use cases, many “suspicious” behaviours can be recognised automatically (i.e., with no human in the loop) such as an object that has been left behind, someone crossing a virtual line, people counting, loitering and many others. Similarly, in the vehicular traffic area, behaviours such as stopped vehicle, or someone driving on the hard shoulder, can be identified.
  • Some very specific objects can be recognised – faces, vehicles, license plates and probably a few more. Although some only under limited conditions – controlled lighting, controlled pose, minimal occlusions, etc.
  • Tracking of specific objects in the camera’s field of view (tracking across multiple cameras, even when there is overlap in successive cameras, is very difficult)
If your interest is in some specific items on this limited list – no problem, you can buy them from numerous vendors. However, if you are looking for a different behaviour or a different object, you will need some computer vision people to develop a new analytic service. That generic object recogniser or the generic “tell me if anything unusual happens in this area” does not exist yet.

But don’t despair – machine learning approaches are starting to appear in some commercial products. Basically, the machine is trained, for example, on video that represents normal vehicular traffic flow and once the learning phase is over, the machine can indicate that something abnormal has happened such as traffic slowdown due to some sort of incident further down the road. By “machine”, by the way, we mean the computer that ingests the video stream and runs the anomaly detection algorithm, which could, in principle, run in the camera itself or very near to it.

Source: IDG Connect

If you enjoyed this post, make sure you subscribe to my Email Updates!