# How much one's individuality cost? At least 2.77544 bits of information

jarek
 Posts 0
jarek
Imagine there is a population/database/dictionary and we would like to distinguish its elements.
So for each element, let us somehow encode its individual features (e.g. using a hash function) as a bit sequence - the most dense way is to use sequence of uncorrelated P(0)=P(1)=1/2 bits.
We can now create minimal prefix tree required to distinguish these sequences, like in the figure below.
For such ensemble of random trees of given number of leaves (n), we can calculate Shannon entropy (H_n) - the amount of information it contains.
It turns out that it asymptotically grows with at average 2.77544 bits per element (1/2+(1+\gamma)lg(e)).
The calculations can be found here:http://arxiv.org/abs/1206.4555

Is it the minimal cost of distinguishability/individuality?
How to understand it?

ps. related question: can anyone find D(n)=\sum_{i=0}^{\infty} 1-(1-2^{-i})^n ?

jarek
 Posts 0
jarek
The first thought about this distinctiveness is probably n! combinatorial term while increasing the size of the system, but n! is about the order and its logarithm grows faster than linearly. Distinctiveness is some much more subtle.
It is well seen in equation (10) from the paper (which should like):
H_n+lg(n!)=nD_n
where D_n is the average depth of leaves of such n leaf tree - average amount of bits distinguishing given element.
So this equation says:
distinctiveness/individualism + order = total amount of information

distinctiveness grows linearly with n (2.77544 asymptotic linear coefficient) - kind of chemical potential...
information about their ordering grows faster: lg(n!)\approx n lg(n)-n lg(e).

Anonymous
 Posts 0
Anonymous
Imagine there is a population/database/dictionary and we would like to distinguish its elements.
So for each element, let us somehow encode its individual features (e.g. using a hash function) as a bit sequence - the most dense way is to use sequence of uncorrelated P(0)=P(1)=1/2 bits.
We can now create minimal prefix tree required to distinguish these sequences, like in the figure below.
For such ensemble of random trees of given number of leaves (n), we can calculate Shannon entropy (H_n) - the amount of information it contains.
It turns out that it asymptotically grows with at average 2.77544 bits per element (1/2+(1+\gamma)lg(e)).
The calculations can be found here:http://arxiv.org/abs/1206.4555

Is it the minimal cost of distinguishability/individuality?
How to understand it?

ps. related question: can anyone find D(n)=\sum_{i=0}^{\infty} 1-(1-2^{-i})^n ?
Even individuality is devaluating in our times - to about 2.3327464 bits per element/specimen (thanks to James Dow Allen):http://groups.google.com/forum/#!topic/ ... 7i-eTXR14E
But this is the final minimal price tag - it cannot be reduced further.
Specifically, for the minimal prefix tree, a random sequence (representing individual features of a specimen) has about 0.721 probability of being identified as belonging to the population ... so if we are interested only in distinguishing inside the population, we can afford increasing this probability up to 1.
To reduce the amount of information in the minimal prefix tree, let us observe that if there appears degree 1 node inside the tree, all sequences from the population going through that node will certainly go in the corresponding direction - we can save 1 bit about the information which exactly is this direction.
In standard prefix tree these degree 1 nodes were the place where it could turn out that an outsider does not belong to the population - removing this information would raise false positive probability from 0.721 to 1.

So if for sequences (0101.., 1010.., 1001..),
the minimal prefix tree remembers (without ordering!): (0....., 101..., 100...),
such reduced one remembers only (0....., 1.1..., 1.0...)

What decreases its asymptotic cost from 2.77544 bits/specimen to about 2.332746.

Anonymous
 Posts 0
Anonymous
Imagine there is a population/database/dictionary and we would like to distinguish its elements.
So for each element, let us somehow encode its individual features (e.g. using a hash function) as a bit sequence - the most dense way is to use sequence of uncorrelated P(0)=P(1)=1/2 bits.
We can now create minimal prefix tree required to distinguish these sequences, like in the figure below.
For such ensemble of random trees of given number of leaves (n), we can calculate Shannon entropy (H_n) - the amount of information it contains.
It turns out that it asymptotically grows with at average 2.77544 bits per element (1/2+(1+\gamma)lg(e)).
The calculations can be found here:http://arxiv.org/abs/1206.4555

Is it the minimal cost of distinguishability/individuality?
How to understand it?

ps. related question: can anyone find D(n)=\sum_{i=0}^{\infty} 1-(1-2^{-i})^n ?
Presentation (included graph compression):http://dl.dropbox.com/u/12405967/hashsem.pdf

Of Course
 Posts 0
Of Course
Imagine there is a population/database/dictionary and we would like to distinguish its elements.
So for each element, let us somehow encode its individual features (e.g. using a hash function) as a bit sequence - the most dense way is to use sequence of uncorrelated P(0)=P(1)=1/2 bits.
We can now create minimal prefix tree required to distinguish these sequences, like in the figure below.
For such ensemble of random trees of given number of leaves (n), we can calculate Shannon entropy (H_n) - the amount of information it contains.
It turns out that it asymptotically grows with at average 2.77544 bits per element (1/2+(1+\gamma)lg(e)).
The calculations can be found here:http://arxiv.org/abs/1206.4555

Is it the minimal cost of distinguishability/individuality?
How to understand it?

ps. related question: can anyone find D(n)=\sum_{i=0}^{\infty} 1-(1-2^{-i})^n ?
Hi Jarek,

I can't see clearly the motivation of your problem.
Could you try to explain it in a few words, as if I was you grand mother?

I gave a try at your series.
Mathematica is able to provide a rational result for any integer value of n that I tested, but it is not able to provide a closed formula for any value of n integer, and it cannot provide a result for any non-integer value that I tried.

For example:

Sum[1 - (1 - 1/2^i)^50, {i, 1, Infinity}] =
8854735051418566449232914370572857450990751555479163941888091801397182668479439529686623797247169009668069841344817946133525822328179581140267089626086311543772066195360209671139720226274851281865470292748901517062900968829 / 1478011635858195935882207857833342157685850053496538711411487419351783286214990897483152011395668340205591209836715482169367803646522894241195093719605989269566805111275694555541625200720156193276990756161085595169170541835 = 5.99

However, I expect that an approximate formula would be very much easier to find, but I don't know if you are interrested in assymptotic values, nor if it matters for you to have results for real n values.

Finally, please note that Pentcho Valev is an idiot that damps any goodwill for good discussion on this pathetic forum. try to find a better place,

Of Course

jarek
 Posts 0
jarek
Imagine there is a population/database/dictionary and we would like to distinguish its elements.
So for each element, let us somehow encode its individual features (e.g. using a hash function) as a bit sequence - the most dense way is to use sequence of uncorrelated P(0)=P(1)=1/2 bits.
We can now create minimal prefix tree required to distinguish these sequences, like in the figure below.
For such ensemble of random trees of given number of leaves (n), we can calculate Shannon entropy (H_n) - the amount of information it contains.
It turns out that it asymptotically grows with at average 2.77544 bits per element (1/2+(1+\gamma)lg(e)).
The calculations can be found here:http://arxiv.org/abs/1206.4555

Is it the minimal cost of distinguishability/individuality?
How to understand it?

ps. related question: can anyone find D(n)=\sum_{i=0}^{\infty} 1-(1-2^{-i})^n ?
Motivation? They are very clear from computer science point of view - finding the (achievable) limits of hashing problem - the base of handling e.g. databases.
But there are also philosophical/sociophysical consequences: how complexity/entropy of population grows in given type of size n society:
- if specimens are indistinguishable, population is described by its size - entropy grows with lg(n)
- if there is a hierarchy (order), entropy grows with lg(n!)~n lg(n)
- if essential are relations in pairs, it grows with n^2
etc.
The surprise is that there exists another class between the first two (lg(n) and n lg(n)) - populations in which specimens are distinguishable, but don't have a hierarchy yet - entropy of this class turns out to grow linearly (like 2.33n).
The funny is that the simplest growth correspond to the most complex situation.

Can we observing a population "measure/define/quantify" entropy/complexity of such a society?
We would need to classify the set of qualitatively different "configurations" it can choose and take logarithm ...
We could be able to combine it with some other statistical parameters, dynamics - creating some kind of thermodynamical ecology

Indeed mathematica can find Sum[1 - (1 - 1/2^i)^n, {i, 1, Infinity}] for specific n, but there would be useful a formula for general n. There are good approximations in the paper, but there appears some oscillations and I cannot describe them well.
This sum is average depth of leaves of such random tree of n leaves - for complete tree it would be lg(n), while because of the randomness it is larger by about 1.33.