Friday, June 3, 2011

Pattern Matching and Information Discovery in Professional Journalism

My day might have been called Tangled in Twitter. It morphed into a Recursion  that spiraled inwards but then morphed again, this time into clusters and patterns that caused seemingly unrelated events to make a lot of sense. In some ways a typical day, but in the end, not really. Jonathan Stray, a journalist at the Associated Press (see my last post if you want a full introduction), is part of a team working to use search engine technology to cluster and categorize "big nasty document sets" such that information emerges that would probably otherwise never have been found. When you are dealing with millions of data points, you could use some algorithmic help.

Tonight a light bulb went off in my head  about the important social potential of computationally driven pattern matching when applied to enormous linguistic data sets. Without my own almost overwhelming set of seemingly unrelated activities today, I don't know if I would have made the connection quite so solidly. So I'll fill you in. I'll also point out the globally significant ways computing is starting to be used in Journalism and where Artificial Intelligence could be used in the future if people like Jonathan keep doing what they are doing.

It all starts with data points. Lots and lots and lots of data that initially seem unrelated. My day's data points included: a morning Skype call that left my brain a bit sore; literally minutes later, before I could even make it  5 feet to the caffeine, an unplanned Skype call from a colleague who wanted to discuss project paperwork issues (groan); a tear up and down the freeway to run an important errand; within seconds of walking in the door a request for another unscheduled Skype call to discuss, among other things, "bandwidth issues" (in retrospect I find this really amusing); a round of phone calls to a clinic about a topic I have been trying to make sense of for 6 weeks; the next unscheduled Skype call; at one point I got annoyed at Twitter for being dense and impenetrable when I least wanted it to be; woven around all of this I was getting lost in journalism-related website after website, trying to figure out where all the behind the scenes computing technology was located, what it was doing and how it was constructed (that was fun). Last but not least this evening I had yet another mind stretching Skype call, this time to Africa, so part of the day was spent on logistical planning for that.

The cool moment, when the patterns of my day fell into place, came in the evening after I bailed for a while, went to a yoga class and worked on getting my legs around behind my head (very non cognitive, thus freeing the mind up to become receptive to new things). I came home and listened to a recording of Jonathan giving a talk about the infinite number of ways in which documents (with all their text data) can be arranged; he reminded his audience that the algorithm we choose for any analysis is based upon preconceptions we hold about the end results;  those preconceptions impose a framework which in turn affect the results. Stop and think about that for a few minutes.

[pause...]

The group Jonathan works with isn't concerned about my preconceptions, personal bandwidth or discoveries about how I allocate my time, who I choose to allocate it to, and what communication methods I use. Yet thinking about the personal internal "algorithms"  I use to structure my actions and make my choices, as well as what I bring to that analysis, led to a mental reorganization of my day. The light bulb turned fully on after I listened to Jonathan's talk (filled with absolutely nifty visuals of course) about mining information from Iraq and Afghanistan war logs for previously unknown patterns of casualties - and other information, really, you just have to watch the video - AND after I thought about the conversation we had a few days ago about the potential of Artificial Intelligence to aid the process of rapid discovery and dissemination of information to the public.

Jonathan is active in the machine learning and semantic web communities. Where he finds the time to read all the reports he reads, I don't know, but he follows the latest advances from academia, industry and the government, including DARPA reports (which, if you have read any official government reports, you know are sometimes tortuous). He follows twitter feeds, open publications by the intelligence community, reports and advances in the fields of law and finance. Well, I guess the ability to suck up and absorb information like an industrial vacuum cleaner is part of what makes a successful journalist. But it makes even more sense to me now why a computer scientist/journalist would see the enormous potential in harnessing AI to mine for information, scrape all the social media outlets, suck up data in real time and dynamically transform it into useful public information.

This is what Jonathan wants to do more of in Journalism. Get those tech savvy journalists and set them to work analyzing the gobs and gobs (my word choice) of data out there that has been (and is being) collected - data that is only going to increase exponentially. And why not? Suddenly this whole idea of "computational journalism" which two months ago seemed a puzzling term makes a whole lot of sense. As I see it,  incorporating AI into document analysis is a logical, practical and viable way to go. For example, what do you think an artificial neural network might make of some of these data sets?


The video you must watch that shows clustering at work on big nasty document sets (and explains how it works too).

No comments:

Post a Comment