Shannon entropy

3/3/2023

If you consider a word, being a discrete source of the finite number of characters type which can be considered, for each possible character there will be a set of probabilities which would produce various outputs. He invented a great algorithm known as the Shannon Entropy which is useful to discover the statistical structure of a word or message. Shannon wrote the paper “ A Mathematical Theory of Communication“, which I strongly encourage to read for its clarity and amazing source of information. Junghyo Jo at Seoul National Univerisity).By Jacobs, Konrad -, CC BY-SA 2.0 de, Link The 12th KIAS CAC Summer School, Korea Institute for Advanced Study, Republic of Korea (the talk was given by Prof.Larger entropy implies the heterogeneity of the data $X_i.$ In communication system, more heterogeneous data implies inefficiency (every messages have the same frequency) In ecology, it implies an ecosystem of more species diversity In thermodynamics, it can be seen as a generalization of the Boltzmann entropy. In application, the entropy can be intuitively understand as an aggregated statistic of uniformity. Hence in its definition, the Shannon entropy of a code $\mathbf X$ bounds the efficiency of all the possible codes of $\mathbf M$ from above. The term $ \log(1/p(x))$ can be interpreted as the amount of information in the sense that it gives more weight to $p(x)$ if it is less likely to happen. The next question is: How can we construct such encoding? We uses a binary tree for this.įor a uniquely decodable code $\mathbf X = \ \right] Notice that unlike the codes in the third column, receiving each bit allows one to instantly differentiate one message from others.Ĭlearly, the instantaneous one is the best among the four. Furthermore, an instantaneous encoding allows instantaneous decoding of the message. A uniquely decodable encoding is the method that does not allow such situation. In the example, notice that in three-bit representation, 10 might be misinterpreted to 010. In constrast, a non-singular encoding is a one-to-one correspondence between the message and the code. For example, consider four messages $M_1$ through $M_4.$Ī Singular encoding is the one that cannot be inversed to decode the message. There may be many ways to encode the message.

$\mathbf X$).įor efficient communication through computers, we need to encode the message into bit codes (consists of 0 or 1). Vector or matrix versions of the above will be denoted by bold typeface (e.g. $X_i$: the $i$th code corresponding to $M_i$.Using the training data, the learner updates the encoder/decoder to estimate the label $\hat y.$ In this sense, the information theory achieves close relationship with statistical learning theory. The learner can be seen as being consists of encoder and decoder part. Then one can think of the target class $y$ as an original message. Using the communication theory framework, we can interpret the learner as an auto-encoder. Why do we care about communication when our interest is in statistics? Consider a supervised classification problem. For data transmission, we use mutual information and channel capacity. We will focus on quantification of information for evaluation of methods.įor quantifying data compression, we use the Shannon entropy and rate distortion theory. The two essentials of communication is compression and transmission of data. Our goal is to reconstruct the message $\hat W$ that best guesses the $W$ sender sent. The system assumes that we loses some information from the original message with probability $p(y\vert x)$ while the message is being sent through a noisy communication channel. The Mathematical model of a communication system ( source: Wikimedia)

0 Comments

Shannon entropy

Leave a Reply.

Author

Archives

Categories