Friday, October 28, 2011

Banana Trucks, Stock Traders, iPods and More...

Information interchange is an activity shared by biological, chemical, physical, economic and social systems. As science and technology have accelerated in recent decades, we now routinely deal with enormous amounts of data that we try to model accurately in order to understand. When you think about it, these systems have a lot in common. 

For example, my rudimentary understanding of how stock market trading works, leads me to understand that market traders need to get certain complex data right at the beginning of the market day. Within the first few minutes in fact. Any later, and that data is useless to them.  Incomplete or inaccurate and that data is useless to them. This is essentially the same challenge faced by our wireless car which must receive all its varied sensor data properly.

Information theory provides the starting point for addressing these challenges and those of even more complex natural and artificial information interchange systems alluded to above. Claude Shannon's theorems (greatly simplified and hopefully not misrepresented as a result) assumed that data can be encoded as a string of 1s and 0s and transmitted from one point to another. Shannon's ideas enabled us to understand and address data degradation over distance traveled, effects of interference ("noise") and the recognition that when we encode, store and transmit data we can compress it to save space and reduce latency (i.e. speed things up).

An entire communications industry was spawned as a result of information theory. Among other things, we have been able to leverage our detailed understanding of the costs associated with data compression. For example, have you ever compared the sound of a song played on an iPod with a song played from a professionally mastered CD? Which has better quality? The CD of course. On the other hand, you can store far more songs on an iPod than on a CD. This difference has little to do with "old" or "new" storage media technology. If a 1 hour CD can hold approximately 12 songs, whereas your iPod can hold far more than that, it is because, in order to squeeze additional music on the iPod, you lose some digital bits that supply the greater dynamic range heard on the CD.

You probably don't care though, right? You don't expect professional quality music reproduction coming through those little ear buds. Cycling down the road, jogging along the trail, or wherever you are tearing around dangling your iPod, your primary concern is access to reasonably decent portable music. Conversely, you expect a CD playing on a multi-speaker Surround Sound home theatre system to produce pretty darned nice music, right? I certainly do. Personally, I don't know if I could handle hearing Bohemian Rhapsody in its super compressed format on an iPod. But I could definitely rock to the original Hawaii 5-O Theme Song.

How do we know how to produce just the right fidelity for MP3s, CDs, DVDs, etc? Because information theory has led to a highly accurate understanding of the tradeoffs inherent in different types and amounts of data compression as well as a host of other factors that come into play when deciding the optimal way to transmit data point to point. So we know how to model different audio/video systems and choose the compression, transmission and storage methods most appropriate for a given context. The digital data comprising a Dixie Chicks song will be treated much differently if it is to find its way onto a CD vs. an iPod or the movie theatre. Yet each scenario will be optimal and we understand how to make it so.

If only that was all there is to it. The problem is, not all data manipulation is as straightforward. This is where I want to start calling it "information" rather than "data", because much of the "data" we now study has meaning that cannot be divorced from it - hence is "information".

Consider the modeling of a system for optimal routing of commercial trucks. To keep it as simple as possible, let's say all trucks are going from Point A in San Diego, California to Point B in Chicago, Illinois. A truck full of hazardous waste and a truck full of bananas are definitely not the same animal. The optimal routing of those trucks is not the same. The hazardous waste truck must avoid certain roads and population areas, perhaps certain times of day, even if it takes longer to get from San Diego to Chicago. The bananas on the other hand, must arrive as quickly as possible or else rotten mush will show up at the grocery store. A dynamic traffic routing system must understand the semantics of the data, i.e. what is in each truck en route. In this case, semantics also affects temporal issues - timeliness of arrival. The system must be able to adjust on the fly to all the unexpected things that could happen along the way: snow storm, sink hole, holiday parade, incorrect satellite map of the road! Plus, let's complicate matters even more: if our grocery store in downtown Chicago is deluged at the same time by 50 trucks of perishable produce coming from ports on both coasts, there will be Chaos in The Loop (downtown Chicago). And more likely than not, rotten mush on the shelves and in the dumpsters.

Do you see how our wireless car system design has a lot in common with our commercial truck routing system? This is the same class of challenges that exist within a myriad of natural and human-made systems.  Even if you aren't a professional in any of the fields listed in the opening paragraph, you may begin to see the commonalities within complex information interchange systems. To understand these systems we first have to try and model them, taking into consideration all the relevant variables and their interactions: syntactically, semantically, temporally.

Information theory has to be taken to the next level, to incorporate additional factors. Do you want to hear more about all this from an expert? As you wait for the next installment in these posts, consider tuning in October 31st to an interesting talk on the history and philosophy of information

Not to mention, by doing so, you will get a head's up on the people who are working on "Shannon 2.0".






No comments:

Post a Comment