The data capacity of an object is defined to be the Logarithm of the number of different distinguishable states the object can be placed into. For example, a coin that can be placed heads or tails has a data capacity of $~$\log(2)$~$ units of [-data]. The choice of [-logarithm_base base] for the logarithm determines the unit of data; common units include the Bit (of data), [-nat], and Decit (corresponding to base 2, e, and 10). For example, the coin has a data capacity of $~$\log_2(2)=1$~$ bit, and a pair of two dice (which can be placed into 36 distinguishable states) have a data capacity of $~$\log_2(36) \approx 5.17$~$ bits. Note that the data capacity of a channel depends on the ability of an observer to distinguish different states of the object: If the coin is a penny, and I'm able to tell whether you placed the image of Abraham Lincoln facing North, South, West, or East (regardless of whether the coin is heads or tails), then that coin has a data capacity of $~$\log_2(8) = 3$~$ bits when used to transmit a message from you to me.
The data capacity of an object is closely related to the [channel_capacity channel capacity] of a [communications_channel communications channel]. The difference is that the channel capacity is the amount of data the channel can transmit per unit time (measured, e.g., in bits per second), while the data capacity of an object is the amount of data that can be encoded by putting the object into a particular state (measured, e.g., in bits).
Why is data capacity defined as the logarithm of the number of states an object can be placed into? Intuitively, because a 2GB hard drive is supposed to carry twice as much data as a 1GB hard drive. More concretely, note that if you have $~$n$~$ copies of a physical object that can be placed into $~$b$~$ different states, then you can use those to encode $~$b^n$~$ different messages. For example for example, with three coins, you can encode eight different messages: HHH, HHT, HTH, HTT, THH, THT, TTH, and TTT. The number of messages that the objects can encode grows exponentially with the number of copies. Thus, if we want to define a unit of message-carrying-capacity that grows linearly in the number of copies of an object (such that 3 coins hold 3x as much data as 1 coin, and a 9GB hard drive holds 9x as much data as a 1GB hard drive, and so on) then data must grow logarithmically with the number of messages.
The data capacity of an object bounds the length of the message that you can send using that object. For example, it takes about 5 bits of data to encode a single letter A-Z, so if you want to transmit an 8-letter word to somebody, you need an object with a data capacity of $~$5 \cdot 8 = 40$~$ bits. In other words, if you have 40 coins in a coin jar on your desk, and if we worked out an [encoding_scheme encoding scheme] (such as ASCII) ahead of time, then you can tell me any 8-letter word using those coins.
What does it mean to say that an object "can" be placed into different states, and what does it mean for those states to be "distinguishable"? Information theory is largely agnostic about the answer to those questions. Rather, given a set of states that you claim you could put an object into, which you claim I can distinguish, information theory can tell you how to use those objects in a clever way to send messages to me. For more on what it means for two physical subsystems to communicate with each other by encoding messages into their environment, see [communication Communication].
Note that the amount of Information carried by an object is not generally equal to the data capacity of the object. For example, say a trusted mutual friend of ours places a coin "heads" if your eyes are blue and "tails" otherwise. If I already know what color your eyes are, then the state of the coin doesn't carry any information for me. If instead I was very sure that your eyes are blue, but actually you've been wearing blue contact lenses and your eyes are really brown, then the coin may carry significantly more than 1 Shannon of information for me. See also Information and [ Information is subjective]. The amount of information carried by an object to me (measured in shannons) may be either more or less than its data capacity — More if the message is very surprising to me, less if I already knew the message in advance. The relationship between the data capacity of an object and the amount of information it carries to me is that the maximum amount of information I can expect to gain by observing the object is equal to the data capacity. For more, see [ Information and data capacity].