Transmitting sound through a machine
and expecting an answer is a human depiction is considered as an
highly-accurate deep learning task, as Analytics India Magazine reports.
Every one of us has come across
smartphones with mobile assistants such as Siri, Alexa or Google
Assistant. These are dominating and in a way invading human
interactions.
The neural networks built with memory capabilities have made speech recognition 99 percent accurate. Neural networks like LSTMs have taken over the field of Natural Language Processing.
A person’s speech can also be understood and processed into text by
storing the last word of the particular sentence which is fascinating.
To understand how these state-of-the-art applications work, lets us
break down the whole process of sound recognition to machine
translation.
Wave Breakdown
The audio signal is separated into
different segments before being fed into the network. This can be
performed with the help of various techniques such as Fourier analysis
or Mel Frequency, among others. The graph below is a representation of a
sound wave in a three-dimensional space. A Fourier transform can be
performed on a sound wave to represent and visualise them in time or
frequency domain...
What are LSTM Neural Networks?
The LSTM is a network of cells where
these cells take input from the previous state ht-1 and current input
xt. The main function of the cells is to decide what to keep in mind and
what to omit from the memory. The past state, the current memory and
the present input work together to predict the next output. The LSTM
networks are popular nowadays because of their accurate performance in
language processing tasks...
Conclusion
With the understanding of how to
process sound on a machine, one can also work on building their own
sound classification systems. But when it comes to deep learning, the
data is the key. Larger the data, better the accuracy.
Read more...
Source: Analytics India Magazine