AudioDB Tutorial 03 – Hip-hop History

The Question

Radio programs detailing the history of Hip-hop have been produced; we are interested to discover whether and/or where in those programs a particular excerpt of NY State of Mind by Nas can be found.

Feature Extraction

We use fftExtract to generate 20-bin MFCC features at 100ms hopsize for the radio programs from The Rub; additionally, we produce power features for each of these programs. Specifically, for each wav file (named TheRubYYYY.wav), we run

fftExtract -m 20 -p rub.plan TheRubYYYY.wav TheRubYYYY.mfcc20
fftExtract -P -p rub.plan TheRubYYYY.wav TheRubYYYY.power

and the default FFT window/hop/frame parameters for fftExtract are those for 100ms frames.

Creating The Database

audioDB -N TheRub.adb
audioDB -L TheRub.adb
audioDB -P TheRub.adb

for file in TheRubYYYY.wav
do
  audioDB -d TheRub.adb -I -f ${file%wav}mfcc20 -w ${file%wav}power
done

we do not provide a -k flag, so the database key defaults to the name f the feature file.

Formulating the Query

We take the NY State of Mind track, extract features and power using fftExtract as before, and then use those feature/power files (without inserting them) to query the database using the nsequence query type.

audioDB -d TheRub.adb -Q nsequence -f nystateofmind.mfcc20 -w nystateofmind.power -l 50 -p 0

This returns, for up to 10 (default -r value) ‘tracks’ in the database, a list of the 10 (default -n value) closest sequences to the query sequence (the 30-frame/3-second section starting 50 frames/5 seconds into the track).

TheRub1994.mfcc20 1.09787
0.963232 0 3681
1.02938 0 3709
1.08401 0 3596
1.09635 0 3680
1.09665 0 3624
1.12298 0 3677
1.13152 0 4934
1.14924 0 3676
1.14933 0 3679
1.15602 0 4963
TheRub1990.mfcc20 1.10122
1.08308 0 55845
1.08515 0 55873
1.09193 0 55846
1.09672 0 55851
1.10024 0 55874
1.10042 0 55850
1.10298 0 55859
1.11082 0 55865
1.11108 0 55842
1.12982 0 55844
1979therub.mfcc20 1.15303
1.12199 0 15676
1.13555 0 15675
1.14327 0 15635
1.15129 0 15604
1.15498 0 15125
1.157 0 15140
1.16068 0 15602
1.16683 0 15658
1.16699 0 15144
1.17168 0 15488

The way to read these results is as follows: each individual program in the database has a summary entry, ordered by the average distance of the 10 closest points to the query. These 10 closest points are then ordered by distance, with the query position and database track position of the match (in frames) following.

In this case, the closest match is at 368.1 seconds (6:08) into the 1994 programme, which is indeed a performance of the same track.

Note: on systems with lots of memory, replacing the -p 0 query specifier with -e searches the database with all possible 30-frame sequences, rather than a single one. Unfortunately, the memory requirements for this are high. Alternatively, do multiple single runs and aggregate after-the-fact.