Entropy(1): A representation of uncertainty and its basic properties

xiaoxiao2021-04-18  41

I first learned entropy in physics, which in thermodynamics, measures energy dispersal at a specific temperature. Later I learned about the term information entropy in communication. And then in machine learning, it is widely used as a representation of uncertainty. Here we are talking about Shannon entropy, whose definition is − ∑ i = 1 K p i log ⁡ 2 p i -\sum_{i=1}^K p_i\log_2 p_i i=1Kpilog2pi. There are other measures of uncertainty, but for some reason, people choose Shannon entropy more often for its good properties [1].

The following are mainly summarized and extended from [1].

What are the basic properties of Shannon entropy?

Property 1: Uniform distribution has max entropy

This can be proved using Weighted AM–GM inequality [2]: w 1 x 1 + w 2 x 2 + ⋯ + w n x n w ≥ x 1 w 1 x 2 w 2 ⋯ x n w n w {\frac {w_{1}x_{1}+w_{2}x_{2}+\cdots +w_{n}x_{n}}{w}}\geq {\sqrt[ {w}]{x_{1}^{{w_{1}}}x_{2}^{{w_{2}}}\cdots x_{n}^{{w_{n}}}}} ww1x1+w2x2++wnxnwx1w1x2w2xnwn by letting w i w = p i \frac{w_i}{w}=p_i wwi=pi, x i = 1 p i x_i=\frac{1}{p_i} xi=pi1.

Property 2: Additivity of independent events

To formulate this property in math equations, we have H ( X , Y ) = H ( X ) + H ( Y ) H(X,Y) = H(X) + H(Y) H(X,Y)=H(X)+H(Y), if X ⊥ Y X\perp Y XY

Another function − ∑ i = 1 K p i 2 -\sum_{i=1}^K p_i^2 i=1Kpi2, which satisfies the first property, does not satisfy this one. That’s why trace of covariance as a representation of uncertainty may not be as good as entropy.

Property 3: Zero-prob outcome does not contribute to entropy

H ( p 1 , p 2 , … , p n ) = H ( p 1 , p 2 , … , p n , p n + 1 = 0 ) H(p_1,p_2,\dots,p_n) = H(p_1,p_2,\dots,p_n,p_{n+1}=0) H(p1,p2,,pn)=H(p1,p2,,pn,pn+1=0)

Property 4: Continuity in all arguments

Some other measurements also satisfies this property, such as trace of covariance matrix, determinant of covariance matrix.

Note: there is a Uniqueness Theorem [1]

Khinchin (1957) showed that the only family of functions satisfying the four basic properties described above are of the following form: H ( p 1 , p 2 , … , p K ) = − λ ∑ i = 1 K p i log ⁡ 2 p i H(p_1,p_2,\dots,p_K)=-\lambda\sum_{i=1}^K p_i\log_2 p_i H(p1,p2,,pK)=λi=1Kpilog2pi Functions that satisfy the 4 basic properties where λ \lambda λ is a positive constant. Khinchin referred to this as the Uniqueness Theorem. Setting λ = 1 \lambda = 1 λ=1 and using the binary logarithm gives us the Shannon entropy. To reiterate, entropy is used because it has desirable properties and is the natural choice among the family functions that satisfy all items on the basic wish list (properties 1–4).

Besides the above discussion for the basic 4 properties of entropy, there are some other interesting facts about entropy that I will explore later.

References

[1] Entropy is a measure of uncertainty, Sebastian Kwiatkowski [2] Inequality of arithmetic and geometric means, Wikipedia

转载请注明原文地址: https://www.6miu.com/read-4820245.html

最新回复(0)