Transmitting sound through a machine 
and expecting an answer is a human depiction is considered as an 
highly-accurate deep learning task, as Analytics India Magazine reports.
Every one of us has come across 
smartphones with mobile assistants such as Siri, Alexa or Google 
Assistant. These are dominating and in a way invading human 
interactions. 
The neural networks built with memory capabilities have made speech recognition 99 percent accurate. Neural networks like LSTMs have taken over the field of Natural Language Processing.
 A person’s speech can also be understood and processed into text by 
storing the last word of the particular sentence which is fascinating. 
To understand how these state-of-the-art applications work, lets us 
break down the whole process of sound recognition to machine 
translation.
Wave Breakdown 
The audio signal is separated into 
different segments before being fed into the network. This can be 
performed with the help of various techniques such as Fourier analysis 
or Mel Frequency, among others. The graph below is a representation of a
 sound wave in a three-dimensional space. A Fourier transform can be 
performed on a sound wave to represent and visualise them in time or 
frequency domain...
What are LSTM Neural Networks? 
The LSTM is a network of cells where 
these cells take input from the previous state ht-1 and current input 
xt. The main function of the cells is to decide what to keep in mind and
 what to omit from the memory. The past state, the current memory and 
the present input work together to predict the next output. The LSTM 
networks are popular nowadays because of their accurate performance in 
language processing tasks...
Conclusion 
With the understanding of how to 
process sound on a machine, one can also work on building their own 
sound classification systems. But when it comes to deep learning, the 
data is the key. Larger the data, better the accuracy.
Read more... 
Source: Analytics India Magazine  
 
 

 
 Posts
Posts
 
 




 
