Practical Computing for Biologists

posted by Casey Dunn / on December 7th, 2010 / in Books, Technology

I’m happy to announce the release of Practical Computing for Biologists, a book I wrote with my friend Steve Haddock. Here is a flyer with more information. The book is available directly from the publisher, Sinauer Associates, and from Amazon.

We wrote this book because computational tools are becoming increasingly important across all of biology, but few biologists have training in general tools for handling and analyzing data. There are many reasons for the growing role of computers in biology, even in fields that were adequately served by an Excel spreadsheet and a piece of scrap paper just a decade ago. First and foremost, datasets are now just too large to reformat or analyze by hand, and there is increasing interest in analyzing different types of datasets in combination. Both tasks require custom tools.

Biologists are facing larger datasets for a couple reasons. There is a growing number of public data archives where biologists can deposit raw data from their studies, and these archives enable large analyses across datasets. In addition, instruments now generate far more data than they used to. Improved imaging tools scan organisms at very high resolution, DNA sequencers generate 100,000 times more data than they did a few years ago, environmental sensors can log temperature and humidity at sub-second intervals for months, and physiological instruments are growing more precise. Gone are the days when a young scientist can find refuge from statistics, mathematics, and computer programming in the basement of a Natural History museum, the forests of Halmahera, or a developmental biology lab.

But the changes are happening so fast right now that university curricula haven’t kept up, and biologists that were trained even a few years ago now find that they need to learn computer skills that weren’t covered in any of their coursework or prior research training. There is wide recognition that these are critical problems in biology. There are a couple of possible solutions. First, biologists can work to get computer scientists interested in the problems they face and collaborate with them on solutions. Second, biologists can become more proficient with computational tools. Both need to happen.

Many interesting problems in biology are also interesting computational problems, and there is a strong history of close collaboration that has produced software tools that biologists can use even if they don’t have a computer science background. Many of the day-to-day computational challenges that biologists face, however, are not particularly interesting to computer scientists. These include reformatting the output of one program so that it can be used as the input to another program, automating the download of weather data from several field sites, and writing a script to automatically shuttle data through a series of analyses that require multiple programs and some novel calculations. It is highly unlikely that computer scientists would solve these many routine problems for many biologists, and the challenges are so varied that no one piece of software could take care of them all.

This is where our book comes in. We provide a grounding in time-tested general-purpose tools for handling data, including regular expressions, the Unix command line,  Python, and  image editing tools. An emphasis on general-purpose technologies rather than particular analysis programs enables biologists to build a skillset that can be used to face a far larger set of problems. We hope that the book will be useful to established scientists, as a companion book in courses that have a computational component, and as a stand-alone textbook.

The bird sculpture on the cover is by Ann Smith.