AudioDB Tutorial 2(a)

Database Creation Parameters

The -N flag to audioDB is the command-line way to create a new database file. As mentioned in the installation instructions, audioDB preallocates disk space for its tables, by default claiming 2GB of space; on some systems (Mac OS X with default filesystems, for example) this is not ideal because the filesystem does not support sparse files; in other cases, the default space allocation will be insufficient for storing all the information pertinent to a particular investigation. The details of adjusting the default allocation are specified below.

audioDB supports three command-line arguments at database-creation time: --datasize, --ntracks and --datadim. The semantics of these arguments are as follows:

--datasizemaximum size (in MB) of the feature vectors to be stored in the database
--ntracksmaximum number of tracks to be stored in the database
--datadimexpected dimensionality of the features stored in the database

The need for three separate flags arises because some information has constant-space requirements per track (e.g. database key, track length); some have constant-space requirements per frame (start/end time, log power, l2-norm values), and some have space requirements proportional to the database dimension per frame (the feature vectors themselves).

At database creation time, it should be possible to estimate each of these values according to the features already present. If you have a directory of d-dimensional feature files (of the form foonnn.featd), corresponding to a total N tracks, then --ntracks should be at least N, --datadim should be given as exactly d, and --datasize should be at least the total size of all your feature files in MB.

Note: if for some reason you have not yet extracted feature files for your data, you can get an estimate for the --datasize parameter from the audio collection you are intending to use: if the total time for your audio collection is T and the frame frequency is f, then the data size will be (T×f)×(8×d) bytes: d double-floats (each taking 8 bytes) for each frame.