You may be thinking: Why? Aren't there enough digital audio formats already?
But the thing is, I want a format that's simple. Like WAV, except based on frequency instead of time-domain signals. I want simple math. (Well, if you count linear algebra and Fourier analysis as "simple".) I want any programmer to be able to generate "chiptune" music or make simple musical transformations like "make everything an octave lower" without specialized libraries.
Perhaps such a thing already exists; if so, please let me know. Until then, here are my notes for the design of a new format, which I am tentatively naming "LFE" for "Logarithmic Frequency Encoding".
Each time-block of music will be represented as a sum of "basis" waves.
The basis frequencies will be equally logarithmically spaced, with *100 (144) frequencies per octave. The interval between each frequency is thus 8.33... "cents", approximating a "just noticeable difference" in frequency.
The specific basis frequencies are:
F(n) = 2^((n-972)/144)*440 Hz
Note the bias towards the A440 12-EDO pitch standard: Music using this tuning (and consisting only of sine waves) can be exactly represented. The numbering n is arbitrarily chosen to be exactly a dozen times the TGM note number.
Since computer memory is a finite resource, we'll need to impose a reasonable range constraints on n.
If we want our audio files to be converted to or from CD's, then we'll have to deal with their standard 44 100 Hz sampling rate and thus a Nyquist frequency of 22050 Hz. It turns out that n = 1785 = *1049 is the highest we can go.
What about the lower end? It's frequently stated that the "normal" lower limit of human hearing is 20 Hz, which is approximated by F(330) = 20.015231264080082 Hz. Using this as our cutoff frequency gives us 1456 basis frequencies to work with, or a span of *A.14 octaves.
But that's kind of a "weird" number, so let's use a lower limit of F(273) = 15.212581077221454 Hz instead, giving us a "rounder" *A.6 octaves.
With 273 <= n <= 1785, there are 1513 different basis frequencies.
Perhaps it would be a good idea to make the lower and upper frequencies configurable in the file format, to allow "compression" by band-limiting the signal, or to allow representation of infrasound and ultrasound by expanding the frequency range. But for now, I shall assume the above numbers as a sensible "default".
To be continued...
Length of a time block
How many time-domain samples will be used to generate each set of frequency-domain samples? If we use too large of a time block, then fast music can't be represented accurately. OTOH, if we use too small of a time block, then file sizes will be huge.
If we define an LFE time block as *0.1 second (or 3675 CD audio samples), then with 1513 basis frequencies, we will need 12*1513*2 = 36 312 frequency-domain samples per second, compatible in size to the original CD audio format. (The reason for the multiplication by 2 is that a complex representation, like cos and sin components, is needed to be able to represent the phase of a wave.)
But a 1/12 second time quantum is rather long for fast music. In moderate-speed music with a "whole note" around 2 seconds, you're limited to 16th notes or triplet "24th notes" at best. Some pieces use 32nd or 64th notes.
If we go an order of magnitude faster, defining an LFE time block as *0.01 second (or 306.25 CD audio samples), then we will need 435 744 frequency-domain samples per second. That's a huge file size, but we may be stuck with it.
FWIW, the popular MP3 format uses 576 (or *400 — was this format designed by dozenalists?) time-domain samples for each block of frequency-domain samples. That works out to about 13 ms or 0;01A6A second.
At this point, I open the floor for comments.