AudioDB Tutorial 1

Extracting feature files

The first step in using audioDB is to work with extracted feature files. The fftExtract tool can be used to extract feature files in a format which audioDB recognizes.

Assuming that the current directory holds a small number of 44.1kHz 16-bit stereo audio files in WAV format, the following (Bourne shell) command line will extract 12-bin chroma features with a hop size of 100ms:

for file in *.wav; do fftExtract -c 12 "${file}" "${file%.wav}.chr12"; done

we will also need feature files corresponding to log power:

for file in *.wav; do fftExtract -P "${file}" "${file%.wav}.power"; done

It is possible to insert feature vector files into audioDB's management one at a time; however, it is more efficient to collect file information and insert in a batched manner:

for file in *.wav
do
  echo "${file%.wav}" >> uid.txt
  echo "${file%.wav}".chr12 >> chr12.txt
  echo "${file%.wav}".power >> power.txt
done

This step has created three text files, with names of corresponding files, one per line. Now, we are ready to create our audioDB database.

Summary

Creating and inserting into a database

The following command-line creates a database called tutorial.adb with space for 170 million double-floats.

audioDB -N -d tutorial.adb

The default database parameters may not be appropriate if, for example, your collection is particularly large. Tutorial XXX covers the why and how of creating a database with different parameters. For this tutorial we will assume the default sizes are acceptable (which the probably are).

To check that this has worked, you can use the -S flag to print out some status information about the database:

audioDB -S -d tutorial.adb

For historical reasons, it is necessary to turn on automatic l2-norming with the -L flag; it is likely that this requirement will be relaxed in a subsequent audioDB release.


audioDB -L -d tutorial.adb
Also for historical reasons, it is necessary to turn on the use of log power along side the extracted features in the database. This is done with the -P flag.
audioDB -P -d tutorial.adb

Now we can insert the feature vectors (or frames) extracted in the previous section. To do so, we insert in a batch manner all the files at once:

audioDB -d tutorial.adb -B -K uid.txt -F chr12.txt -W power.txt

The -B flag indicates we are operating in batch mode. The file after the -K flag lists the unique identifiers or “keys” for later retrieval of feature files (-F) named in chr12.txt; the corresponding per-frame log power (-W) comes from the files listed in power.txt.

Summary

Querying the database

We are now in a position to perform some retrieval tasks: to perform approximate near-neighbour matching, or to obtain aggregate statistics on the database.

The simplest search is to find the closest matching database sequences to a given query feature vector sequence. To find the five tracks with the nearest matching database sequences to a two second fragment of query track track.wav beginning at 10 seconds, we run the command

audioDB -d tutorial.adb -Q sequence -p 100 -n 1 -l 20 -r 5 -f track.chr12

where -p 100 indicates that we start at the 100th feature vector of track.chr12 (corresponding to the 100th 0.1s frame of track.wav) and -l 20 that we are looking for a sequence of length 20 frames (2 seconds in this case). The -f flag introduces the query feature vector file, while the -Q sequence is there for historical reasons. The -n 1 parameter indicates that we are only interested in the 1st closest sequence match to the query sequence for each track in the database (see tutorial XXX).

This command should produce output of the form

track 4.58372398E-15 100 100
foo 0.011238723 100 24
bar 0.013423750 100 187
baz 0.014587009 100 16
quux 0.017989231 100 392

This is an ordered result list, and can be read as follows: the top hit in a search over the database for track.wav's features starting at the 100th frame for 20 frames was track itself, as one might expect. The second column indicates the Euclidean distance of the closest matching point from within that track, while the third and fourth columns are respectively the query frame and target frame indices; so the closest match to the query sequence in the track with identifier foo is the 20-frame sequence in foo starting at frame index 24 (i.e. 2.4s in).

There are different kinds of search that one might want to perform. The above search is consistent with wanting to find a close match to a given region, perhaps for the purposes of editing a recording session or for “musaicing”. Other motivations entail other search patterns; for instance, to find out whether any track in a database contains any material from a given query track (a remix, for instance), it is worth searching query tracks exhaustively. By replacing the query start point (-p 100) with the -e flag we exhaustively search the database for matches to every 20-frame sequence that exists in the query track:

audioDB -d tutorial.adb -Q sequence -e -n 1 -l 20 -r 5 -f track.chr12

This search looks over all possible 20-element feature vector sequences within track, and reports the five best tracks ranked by a single-sequence match.

It is also possible to search for tracks which share much of their content (mislabelled tracks or “apocrypha”) with the query. To do so, we wish to count sequences in the query track which have at least one close-enough match within a particular database track; “close-enough” is defined by the overall statistics of the feature space. We provide a threshold radius with the -R flag

audioDB -d tutorial.adb -Q sequence -e -n 1 -l 20 -r 5 -R 0.01 -f track.chr12

and the results that we obtain have a slightly different format: the first column is still the track identifier, while the second is now simply a count of the number of query sequences which had at least one matching point in the database track (points with a distance smaller than 0.01 are considered to match).

Summary

Notes

The log power information can be used to restrict sequences for consideration: it is possible to supply the query's log power file (with -w) as well as the feature, at which point you can specify absolute and relative thresholds for the power in query and target sequences.

Similarly, it is possible to give fftExtract and audioDB timing information (such as from a beat tracker or segmentation process) and use that to extract features for beats or segments; the database matching step can then reject tracks based on timing-related criteria.