Thursday, May 19, 2011

Genetic Sequencing, Bio-Informatics and Computing

Today I obtained a new view into the world of bio-informatics when I had a chance to speak with someone who works in the field. This person is not a computing professional but was able to explain to me some of the interesting work in the field that relies on computing. With her assistance I located a company that does some of this work (link coming below). I gathered much of this information over a breakfast meeting and sometimes my coffee, granola parfait and egg (separate plate) got cold as I scribbled notes like mad on the seat next to me. On the other hand my freezing orange juice warmed up while I was writing chewing and talking.

There are many different areas of bio-informatics - huge field. Interestingly (perhaps not surprising) the medium to large sized companies employ many computing professionals who are cross-over people. For example, in one major company my breakfast colleague knows about there are three computing groups under the umbrella of a Vice President: "traditional" IS/IT support, a bio-informatics group and a software development group.

For a company that studies or supports the study of genomes, there are lots and lots of data to manipulate and crunch. Typical customers of such a company are academic researchers in a hospital setting, researchers in federal agencies such as the CDC or NIH, and corporate entities including pharmaceutical companies, bio-tech companies and diagnostic companies.

What do they want to accomplish? One angle is to study a gene, or a biomarker, to try and understand better how it works. The researchers may want to cause a particular gene to do more of what it naturally does or to turn off a particular gene (cause it to stop doing its job). The term used a lot in this type of work is Polymerase Chain Reaction (PCR) which, very simply put, means to amplify a region of a genetic sequence so that it can be detected and manipulated. Another angle of research has as its goal to figure out if a specific gene has an error in a particular (living) population - related to a pathology perhaps. Cancer is a classic example. There is a lot of bio-informatics terminology that I am trying to avoid using, but to use the official terminology as it was explained to me in this instance, people in the line of work we are discussing are trying to identify what the nucleotides are in a particular region of a gene, which will then be used as primers in sequencing. Yes, a mouthful if that isn't your field. If you are still with me you get brownie points.

Computing is the backbone that allows this kind of work to take place, because the clients referred to above want to order specific sequences of DNA for analysis. A company such as Integrated DNA Technologies (the example company I located) supports researchers who are studying genomes by (among other things) providing the genetic sequences needed by those researchers. It has been a loooong time since this was done by hand - with what we know now and the data volume we have it is unimaginable. Forget the row of people in white suits pouring things in and out of test tubes (they may exist, but not for this task). There is an entire computer driven manufacturing process to synthesize desired genetic sequences in the most efficient manner possible. These sequences may need to be highly customized.

Let's say an order comes in for 100 oligonucleotides (a short sequence of nucleotides, which are the basic building blocks of DNA and RNA). Each of the 100 requested oligonucleotides is 20-200 nucleotides long and each one is different. There are banks of computers that have programs to synthesize (create) these oligonucleotides. Not only must the software determine the correct chemicals, the timing of their use and the stability of the result, but the software must also recognize that load balancing is required; the heavily customized sequences slow down the process and are generally forked off to another process along with other similar desired sequences. This can lead to orders from several clients being created together on one "plate". There might be 5 orders of 100 oligonucleotides that are distributed across the system at the same time in order to provide accurate and timely creation.

Eventually the software must take the end results and regroup them for the appropriate customer. Every step of the process involves complex algorithms, timing, QA checking, load balancing and rebalancing and did I say LOTS of quality control? Heaven forbid a customer received a different genetic sequence than they asked for. Someone (lots of someones) has to know their software system optimization techniques as well as their chemistry.

A different computing dependent activity IDT is involved in is providing the software to help customers design the oligonuclotides they want. Double stranded DNA can be several thousand base pairs long and the customer wants to determine where in that genetic sequence is the best place to make and attach (or detach) an oligonucletide. Software provides the ability to evaluate the characteristics of a given oligonucleotide (there are many factors involved). The software can predict the specificity, the stability and the cross-reactivity / structure of potential candidates (I'm getting tired of typing out the "o" word - I'm bound to mis-type it if I haven't already). The customers can use IDT's SciTools to help them decide what request to submit for manufacture.

These tools have to have all the hallmarks of any well designed and constructed s/w application - an easy to use UI for the target customer, efficient processing and logical functionality. The inner guts have to have the usual features including: bug free, flexible, sufficient, robust and responsive. Who is most qualified to design these applications? You guessed it - (interdisciplinary) computer scientists.

No comments:

Post a Comment