[summary: Information is a measure of how much a message grants an observer the ability to predict the world. Information is observer-dependent: If someone tells both of us your age, you'd learn nothing, while I'd learn how old you are. Information is measured in shannons. If you are about to observe a coin that came up either "heads" or "tails," one shannon is the difference between utter uncertainty about which way the coin came up, and total certainty that it came up "heads." Information can be used to quantify both (a) how much uncertainty a person has; and (b) how much uncertainty a message resolves.]

[summary(Technical): Given a probability distribution $\mathrm P$ over a set $O$ of possible observations, an observation $o \in O$ is said to carry $\log \frac{1}{\mathrm P(o)}$ units of information (with respect to $\mathrm P$ ), where the base of the logarithm determines the unit of information. The standard choice is log base 2, in which case the information is measured in shannons.]

Information is a measure of how much a message grants an observer the ability to predict the world. For a formal description of what this means, see [ Information: Formalization]. Information is observer-dependent: If someone tells both of us your age, you'd learn nothing, while I'd learn how old you are. Information theory gives us tools for quantifying and studying information.

Information is measured in shannons, which are also the units used to measure [uncertainty] and [entropy]. Given that you're about to observe a coin that certainly came up either heads or tails, one shannon is the difference between utter uncertainty about which way the coin came up, and total certainty that the coin came up heads. Specifically, the amount of information in an observation is quantified as the Logarithm of the [reciprocal] of the Probability that the observer assigned to that observation.

For a version of the previous sentence written in English, see [measuring_information Measuring information]. For a discussion of why this quantity in particular is called "information," see [ Information: Intro].

Information vs Data

The word "information" has a precise, technical meaning within the field of Information theory. Information is not to be confused with [data], which is a measure of how many messages a communication medium can distinguish in principle. For example, a series of three ones and zeros is 3 bits of data, but the amount of information those three bits carry depends on the observer. Unfortunately, in colloquial usage (and even in some texts on information theory!) the word "information" is used interchangeably with the word "data"; matters are not helped by the fact that the standard unit of information is sometimes called a "bit" (the name for the standard unit of data), despite the fact that these units are distinct. The proper name for a binary unit of information is a "Shannon."

That said, there are many links between information and data (and between shannons and bits). For instance:

An object with a Data capacity of $n$ bits can carry anywhere between 0 and $\infty$ shannons of information (depending on the observer), but the maximum amount of information an observer can consistently [expectation _expect_] from observing the object is $n$ shannons. For details, see [+expected_info_capacity].
The number of shannons an observer gets from an observation is equal to the number of bits in the [encoding encoding] for that observation in their [ideal_encoding]. In other words, shannons measure the number of bits of data you would use to communicate that observation to someone who knows everything you know (except for that one observation). For details, see [+ideal_encoding] and [ Information as encoding length].

For more on the difference between information and data, see [+info_vs_data].

Information and Entropy

[+entropy] is a measure on probability distributions which, intuitively, measures the total uncertainty of that distribution. Specifically, entropy measures the number of shannons of information that the distribution [expectation expects] to gain by being told the actual state of the world. As such, it can be interpreted as a measure of how much information the distribution says it is missing about the world. See also [ Information and entropy].

Formalization

The amount of information an observation carries to an observer is the Logarithm of the [reciprocal] of the Probability that they assigned to that observation. In other words, given a probability distribution $\mathrm P$ over a set $O$ of possible observations, an observation $o \in O$ is said to carry $\log_2\frac{1}{\mathrm P(o)}$ shannons of information with respect to $\mathrm P$ . A different choice for the base of the logarithm corresponds to a different unit of information; see also [ Converting between units of information]. For a full formalization, see [ Information: Formalization]. For an understanding of why information is logarithmic, see [ Information is logarithmic]. For a full understanding of why we call this quantity in particular "information," see [ Information: Intro].