Abstract

Abstract#

Context. Audio-based MIR (MIR based on the processing of audio signals) covers a broad range of tasks, including

music audio analysis (pitch/chord, beats, tagging), retrieval (similarity, cover, fingerprint),
music audio processing (source separation, music translation/style transfer)
music audio generation (of samples or whole tracks).

A wide range of techniques can be employed for solving each of these tasks, spanning

from conventional signal processing and machine learning algorithms
to the whole zoo of deep learning techniques.

Objective. This tutorial aims to review the various elements of this deep learning zoo which are commonly applied in Audio-based MIR tasks. We review typical

inputs: such as waveform, Log-Mel-Spectrogram, CQT, HCQT, Chroma
front-ends: such as Dilated-Conv, TCN, SincNet
projections: such as 1D-Conv, 2D-Conv, U-Net, RNN, LSTM, Transformer
bottleneck: AE, VAE quantization using VQ-VAE, RVQ
training paradigms: such as supervised, unsupervised, encoder-decoder, self-supervised, metric-learning, adversarial, denoising/latent diffusion

Method. Rather than providing an exhaustive list of all of these elements, we illustrate their use within a subset of (commonly studied) Audio-based MIR tasks such as

analysis: multi-pitch, cover-detection, auto-tagging,
processing: source separation
generation: auto-regressive/LLM, diffusion

This subset of Audio-based MIR tasks is designed to encompass a wide range of deep learning elements.

The objective is to provide a 101 lecture (introductory lecture) on deep learning techniques for Audio-based MIR. It does not aim at being exhaustive in terms of Audio-based MIR tasks neither on deep learning techniques but to provide an overview for newcomers to Audio-Based MIR on how to solve the most common tasks using deep learning. It will provide a portfolio of codes (Colab notebooks and Jupyter book) to help newcomers achieve the various Audio-based MIR Tasks.

This tutorial can be considered as a follow-up of the tutorial “Deep Learning for MIR” by Alexander Schindler, Thomas Lidy and Sebastian Böck, held at ISMIR-2018.