[summary: A bit of data is the amount of data required to single out one message from a set of two. Equivalently, it is the amount of data required to cut the set of possible messages in half. If you want to single out one of the hands of one of your biological grandparents (such as the left hand of your paternal grandmother), you can do that by transmitting three bits of data: One to single out "left hand" or "right hand", one to single out "maternal" or "paternal", and one to single out "grandfather" or "grandmother". We can also speak of storing data in physical systems which can be put into multiple states: A coin can be used to store a single bit of data (by placing it either heads or tails); a normal six-sided die can be used to store a little over 2.5 bits of data. In general, a message that singles out one message from a set of $~$n$~$ is defined to contain [3nd $~$\log_2(n)$~$] bits of data.]

A bit of data (not to be confused with a Shannon, an abstract bit, or a [evidence_bit bit of evidence]) is the amount of data required to single out one message from a set of two. If you want to single out one of the hands of one of your biological grandparents (such as the left hand of your paternal grandmother), you can do that with three bits of data: One to single out "left hand" or "right hand", one to single out "maternal" or "paternal", and one to single out "grandfather" or "grandmother". In order for someone to look at the data and know which hand you singled out, they need to know what method you used to [encoding encode] the message into the data.

Data can be stored in physical systems which can be put into multiple states: A coin can be used to store a single bit of data (by placing it either heads or tails); a normal six-sided die can be used to store a little over 2.5 bits of data. In general, a message that singles out one thing from a set of $~$n$~$ is defined to contain [3nd $~$\log_2(n)$~$] bits of data.

"Bit" is a portmanteau of "binary digit": "binary" because it is the amount of information required to single out one message from a set of 2. The 2 is arbitrary; analogous units of data exist for every other number. For example, a Trit is the amount of data required to single out one message from a set of three, a Decit is the amount of data it takes to single out one message from a set of ten, and so on. It is easy to convert between units, for example, a decit is $~$\log_2(10) \approx 3.32$~$ bits, because it takes a little over three bits to pick one thing out from a set of 10. See also the pages on [ converting between units of data] and fractional bits.

[fixme: Talk about how we want to define data such that two objects hold twice as much.]

The amount of data you can transmit (or store) grows exponentially in the number of bits at your disposal. For example, consider a punch card that can hold ten bits of data. You can use one punch card to pick out a single thing from a set of $~$2^{10}=1024.$~$ From two punch cards you can pick out a single thing from a set of $~$2^{20}=1048576.$~$ The number of *things you can distinguish* using two punch cards is 1024 times larger than the number of things you can distinguish with one punch card, and the amount of data you can encode using two punch cards is precisely twice as much (20 bits) as the amount of information you can encode using one punch card (10 bits). In other words, you can single out one object from a collection of $~$n$~$ (or store a number between 1 and $~$n$~$) using [3nd $~$\log_2(n)$~$] bits of data. For more on why the amount of data is logarithmic in the number of possible messages, see [ Data is logarithmic].

[fixme: Add a formalization section.]