开发者

Practical way of explaining "Information Theory"

开发者 https://www.devze.com 2022-12-20 16:36 出处:网络
Information theory comes into play where ever encoding & decoding is present. For example: compression(multimedia), cryptography.

Information theory comes into play where ever encoding & decoding is present. For example: compression(multimedia), cryptography.

In Information Theory we encounter terms like "Entropy", "Self Information", "Mutual Information" and entire subject is based on these terms. Which just sound nothing more than abstract. Frankly, they don't really make any sense.

Is there any book/material/explanation (if you ca开发者_如何学JAVAn) which explains these things in a practical way?

EDIT:

An Introduction to Information Theory: symbols, signals & noise by John Robinson Pierce is The Book that explains it the way I want (practically). Its too good. I started reading it.


Shanon's original paper "A mathematical theory of communication" is one very very important resource for studying this theory. Nobody NOBODY should miss it.

By reading it you will understand how Shanon arrived at the theory which should clear most of the doubts.

Also studying workings of Huffman compression algorithm will be very helpful.

EDIT:

An Introduction to Information Theory

John R. Pierce

seems good according to the amazon reviews (I haven't tried it).

[by Googleing "information theory layman" ]


My own view on "Information Theory" is that it's essentially just applied math / statistics but because it's being applied to communications / signals it's been called "Information Theory".

The best way to start understanding the concepts is to set yourself a real task. Say for example take a few pages of your favourite blog save it as a text file and then attempt to reduce the size of the file whilst ensuring you can still reconstruct the file completely (I.e. lossless compression). You'll start for example replacing all the instances of and with a 1 for example....

I'm always of the opinion learning by doing will be the best approach


I was going to recommend Feynman for pop-sci purposes, but on reflection I think it might be a good choice for easing into a serious study as well. You can't really know this stuff without getting the math, but Feynman is so evocative that he sneaks the math into without scaring the horses.

Feynman Lectures on Computation http://ecx.images-amazon.com/images/I/51BKJV58A9L._SL500_AA240_.jpg

Covers rather more ground than just information theory, but good stuff and pleasant to read. (Besides, I am obligated to pull for Team Physics. Rah! Rah! Rhee!)


I remember articles in, I think, Personal Computer World that presented a version of ID3 for identifying coins, though it used a heuristic alternative to the log formula. I think it minimised sums of squares rather than maximising entropy - but it was a long time ago. There was another article in (I think) Byte that used the log formula for information (not entropy) for similar things. Things like that gave me a handle that made the theory easier to cope with.

EDIT - by "not entropy" I mean I think it used weighted averages of information values, but didn't use the name "entropy".

I think construction of simple decision trees from decision tables is a very good way to understand the relationship between probability and information. It makes the link from probability to information more intuitive, and it provides examples of the weighted average to illustrate the entropy-maximizing effect of balanced probabilities. A very good day-one kind of lesson.

And what's also nice is you can then replace that decision tree with a Huffman decoding tree (which is, after all, a "which token am I decoding?" decision tree) and make that link to coding.

BTW - take a look at this link...

  • http://www.inference.phy.cam.ac.uk/mackay/

Mackay has a free downloadable textbook (and available in print), and while I haven't read it all, the parts I have read seemed very good. The explanation of "explaining away" in Bayes, starting page 293, in particular, sticks in mind.

CiteSeerX is a very useful resource for information theory papers (among other things).Two interesting papers are...

  • http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.2410
  • http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.51.3672

Though CN2 probably isn't day one material.


Though, the concepts may be abstract, they find good use in recent times in Machine learning/Artificial intelligence.

This might serve as a good motivation on practical need for these theoretic concepts. In summary, you want to estimate how well your model (LSTM, CNN for example) does in approximating the target output ( using for example cross entropy or Kullback-Leibler Divergence from information theory). (check on information bottleneck and deep learning and information Bottleneck principle for perspectives on explaining deep learning through information theory)

In addition, you won't build a useful communication or networked system without some analysis of the channel capacity and properties.

In essence, it might look theoretic but it is at the heart of the present communication age.

To get a more elaborate view on what I mean, I invite you to watch this ISIT lecture: The Spirit of Information Theory by Prof David TSe.

Check also the paper Bandwagon by Claude Channon himself explaining when information theory might be useful and when it is not appropriate for use.

This paper helps you get you started and for comprehensive details read Elements of Information theory.


Information theory has very efficient applications in e.g. machine learning and data mining. in particular data visualization, variable selection, data transformation and projections, information theoretic criteria are among the most popular approaches.

See e.g.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.87.825&rep=rep1&type=pdf or http://www.mdpi.com/1424-8220/11/6/5695

Information theory allows us to approach optimal data compaction in a formal way e.g. in terms of posterior distributions and Markov Blankets:

http://www.mdpi.com/1099-4300/13/7/1403

It allows us to retrieve upper and lower bounds on the probability of error in variable selection:

http://www.mdpi.com/1099-4300/12/10/2144

One of the advantages of using information theory compared to statistics is that one doesn't necessarily need to set up probability distributions. One can compute information, redundancy, entropy, transfer entropy without trying to estimate probability distributions at all. Variable elimination without information loss is defined in terms of preservation of conditional posterior probabilities, using information theory one can find similar formulations...without the need to compute probability densities. Caculations are rather in terms of mutual information between variables and the litterature has provided a lot of efficient estimators and lower dimensional approximations for these. See: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.87.825&rep=rep1&type=pdf http://www.mdpi.com/1424-8220/11/6/5695


I could suggest this book by Glynn Winskel. It was used in my university for the Information Theory course. It starts from Logic Theory then defines a simple imperative language, called IMP, and it follows with many concepts about formal semantics in language.

The Formal Semantics of Programming Languages

http://mitpress.mit.edu/books/formal-semantics-programming-languages


Information theory is a branch of mathematics and electrical engineering that deals with the transmission, processing, and storage of information. It was originally proposed by Claude Shannon in 1948 to deal with the noise in telephone lines.

At its core, information theory is all about quantifying the amount of information contained in a signal. This can be done in a variety of ways, but one common measure is entropy. Entropy is a measure of how much uncertainty there is about the contents of a signal. The higher the entropy, the less predictable the signal is.

Information theory is important because it allows us to quantify and measure information. This is significant because it allows us to better understand and optimize communication systems. Additionally, information theory can be used to measure the amount of data that can be compressed into a given space.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号