Developing A New Audio File Format

Application of dozenal to musical notation

Developing A New Audio File Format

Dan
Dozens Disciple
Dan
Dozens Disciple
Joined: Aug 8 2005, 02:45 PM

Dec 7 2017, 04:31 AM #1

You may be thinking: Why? Aren't there enough digital audio formats already?

But the thing is, I want a format that's simple. Like WAV, except based on frequency instead of time-domain signals. I want simple math. (Well, if you count linear algebra and Fourier analysis as "simple".) I want any programmer to be able to generate "chiptune" music or make simple musical transformations like "make everything an octave lower" without specialized libraries.

Perhaps such a thing already exists; if so, please let me know. Until then, here are my notes for the design of a new format, which I am tentatively naming "LFE" for "Logarithmic Frequency Encoding".

Basis Frequencies

Each time-block of music will be represented as a sum of "basis" waves.

The basis frequencies will be equally logarithmically spaced, with *100 (144) frequencies per octave. The interval between each frequency is thus 8.33... "cents", approximating a "just noticeable difference" in frequency.

The specific basis frequencies are:

F(n) = 2^((n-972)/144)*440 Hz

Note the bias towards the A440 12-EDO pitch standard: Music using this tuning (and consisting only of sine waves) can be exactly represented. The numbering n is arbitrarily chosen to be exactly a dozen times the TGM note number.

Since computer memory is a finite resource, we'll need to impose a reasonable range constraints on n.

If we want our audio files to be converted to or from CD's, then we'll have to deal with their standard 44 100 Hz sampling rate and thus a Nyquist frequency of 22050 Hz. It turns out that n = 1785 = *1049 is the highest we can go.

What about the lower end? It's frequently stated that the "normal" lower limit of human hearing is 20 Hz, which is approximated by F(330) = 20.015231264080082 Hz. Using this as our cutoff frequency gives us 1456 basis frequencies to work with, or a span of *A.14 octaves.

But that's kind of a "weird" number, so let's use a lower limit of F(273) = 15.212581077221454 Hz instead, giving us a "rounder" *A.6 octaves.

With 273 <= n <= 1785, there are 1513 different basis frequencies.

Perhaps it would be a good idea to make the lower and upper frequencies configurable in the file format, to allow "compression" by band-limiting the signal, or to allow representation of infrasound and ultrasound by expanding the frequency range. But for now, I shall assume the above numbers as a sensible "default".

To be continued...
Quote
Like
Share

Dan
Dozens Disciple
Dan
Dozens Disciple
Joined: Aug 8 2005, 02:45 PM

Dec 7 2017, 05:47 AM #2

Length of a time block

How many time-domain samples will be used to generate each set of frequency-domain samples? If we use too large of a time block, then fast music can't be represented accurately. OTOH, if we use too small of a time block, then file sizes will be huge.

If we define an LFE time block as *0.1 second (or 3675 CD audio samples), then with 1513 basis frequencies, we will need 12*1513*2 = 36 312 frequency-domain samples per second, compatible in size to the original CD audio format. (The reason for the multiplication by 2 is that a complex representation, like cos and sin components, is needed to be able to represent the phase of a wave.)

But a 1/12 second time quantum is rather long for fast music. In moderate-speed music with a "whole note" around 2 seconds, you're limited to 16th notes or triplet "24th notes" at best. Some pieces use 32nd or 64th notes.

If we go an order of magnitude faster, defining an LFE time block as *0.01 second (or 306.25 CD audio samples), then we will need 435 744 frequency-domain samples per second. That's a huge file size, but we may be stuck with it.

FWIW, the popular MP3 format uses 576 (or *400 — was this format designed by dozenalists?) time-domain samples for each block of frequency-domain samples. That works out to about 13 ms or 0;01A6A second.



At this point, I open the floor for comments.
Quote
Like
Share

Double sharp
Dozens Demigod
Double sharp
Dozens Demigod
Joined: Sep 19 2015, 11:02 AM

Dec 7 2017, 07:47 AM #3

We are indeed stuck with 1/144 s, though not because of 128th and 256th notes (128ths are common in slow music of the Classical period, if rarer than 64ths, while 256ths are rare but not unheard of, mostly because they appear in Beethoven's Third Piano Concerto). Rather it's because of piano glissandi, which are composed of discrete notes and can go really fast, though not as fast as single biciaseconds.
Quote
Like
Share

Dan
Dozens Disciple
Dan
Dozens Disciple
Joined: Aug 8 2005, 02:45 PM

Dec 8 2017, 05:53 AM #4

One possibility is to represent an audio file as a raster image, with one dimension having one pixel for each basis frequency, and the other dimension representing time, with one pixel per biciasecond time block. The color of each pixel would have a brightness representing its magnitude and the hue (as in the HSL or HSV color model) representing the phase angle.

We could then use conventional image compression methods, whether lossless (PNG) or lossy (JPEG) to compress music for us.

If nothing else, it would give us a nice way of auto-generating graphic notation for music.
Quote
Like
Share