Spectrograms for neural nets

What are spectrograms used for?
What are Mel spectrograms used for?
What's wrong with CNNs and spectrograms for audio processing?
What is spectrogram in machine learning?

What are spectrograms used for?

A spectrogram is a visual way of representing the signal strength, or “loudness”, of a signal over time at various frequencies present in a particular waveform. Not only can one see whether there is more or less energy at, for example, 2 Hz vs 10 Hz, but one can also see how energy levels vary over time.

What are Mel spectrograms used for?

The mel spectrogram remaps the values in hertz to the mel scale. The linear audio spectrogram is ideally suited for applications where all frequencies have equal importance, while mel spectrograms are better suited for applications that need to model human hearing perception.

What's wrong with CNNs and spectrograms for audio processing?

Sounds are “transparent”

One challenge posed in the comparison between visual images and spectrograms is the fact that visual objects and sound events do not accumulate in the same manner. To use a visual analogy, one could say that sounds are always “transparent” [4] whereas most visual objects are opaque.

What is spectrogram in machine learning?

(Spectrograms are images of time-frequency domain features that were extracted from wave signals) And once you have those, then you can move forward with a straight ahead image classification deep learning project using those spectrograms.