Research News
3:37 am
Thu January 24, 2013

Shall I Encode Thee In DNA? Sonnets Stored On Double Helix

Originally published on Thu January 24, 2013 1:19 pm

English critic Samuel Johnson once said of William Shakespeare "that his drama is the mirror of life." Now the Bard's words have been translated into life's most basic language. British scientists have stored all 154 of Shakespeare's sonnets on tiny stretches of DNA.

It all started with two men in a pub. Ewan Birney and Nick Goldman, both scientists from the European Bioinformatics Institute, were drinking beer and discussing a problem.

Their institute manages a huge database of genetic information: thousands and thousands of genes from humans and corn and pufferfish. That data — and all the hard drives and the electricity used to power them — is getting pretty expensive.

"The data we're being asked to be guardians of is growing exponentially," Goldman says. "But our budgets are not growing exponentially."

It's a problem faced by many large companies with expanding archives. Luckily, the solution was right in front of the researchers — they worked with it every day.

"We realized that DNA itself is a really efficient way of storing information," Goldman says.

DNA is nature's hard drive, a permanent record of genetic information written in a chemical language. There are just four letters in DNA's alphabet — the four nucleotides commonly abbreviated as A, C, G and T.

When these letters are arranged in different ways, they spell out different instructions for our cells. Some 3 billion of those letters make up the human genome — the entire instruction manual for our existence. And all that information is stuffed into each cell in our bodies. DNA is millions of times more compact than the hard drive in your computer.

The challenge before Goldman and his colleagues was to make DNA store a digital file instead of genetic information.

"So over a second beer, we started to write on napkins and sketch out some details of how that might be made to work," Goldman says.

They started with a text file of one of Shakespeare's sonnets. In the computer's most basic language, it existed as a series of zeroes and ones. With a simple cipher, the scientists translated these zeroes and ones into the letters of DNA.

And then they did the same for the rest of Shakespeare's sonnets, an audio clip of Martin Luther King Jr.'s "I Have a Dream" speech, and a picture of their office. They sent that code off to Agilent Technologies, a biotech company. Agilent synthesized the DNA and mailed it back to Goldman.

"My first reaction was that they hadn't done it properly, because they sent me these little tiny test tubes that were quite clearly empty," Goldman says.

But the DNA was there — tiny specks at the bottom of the tubes. To read the sonnets, they simply sequenced the DNA and ran their cipher backward. All the files were 100 percent intact and accurate.

They published their results in the journal Nature, joining other groups who have experimented with DNA storage. George Church, a geneticist at Harvard who helped start the Human Genome Project, encoded an HTML file of his latest book into DNA earlier this year.

Goldman and Birney's method included greater redundancies and overlapping stretches of DNA to prevent against errors. They say the process would be easy to scale up.

If you took everything human beings have ever written — an estimated 50 billion megabytes of text — and stored it in DNA, that DNA would still weigh less than a granola bar.

"There's no problem with holding a lot of information in DNA," Goldman says. "The problem is paying for doing that."

Agilent waived the cost of DNA synthesis for this project, but the researchers estimate it would normally cost about $12,400 per megabyte.

"It's an unthinkably large amount of money ... at the moment," Goldman says.

Goldman and other scientists who are dabbling in DNA storage know that DNA synthesis costs are dropping rapidly. In a decade or so, they say it may be more cost effective for large companies to keep a DNA archive than to maintain and update a roomful of hard drives.

Copyright 2013 NPR. To see more, visit http://www.npr.org/.

Transcript

RENEE MONTAGNE, HOST:

Here is a problem facing us in this digital age. All the data we're stockpiling - digital images, tax records, unfinished novels - where are we going to store them? Some scientists say they may have a solution. It's not digital, it's biological.

NPR's Adam Cole reports scientists have successfully tested out using DNA as an archive by recording all of Shakespeare's sonnets on a double helix.

ADAM COLE, BYLINE: It all started in a pub a few months ago. Nick Goldman and Ewan Birney, two scientists from the European Bioinformatics Institute, were drinking beer and discussing a problem.

Their institute manages a huge database of genetic information - thousands and thousands of genes from humans and corn and pufferfish. And Goldman says all that data - and all the hard drives and the electricity used to power and keep them cool - is getting pretty expensive.

NICK GOLDMAN: The data we are being asked to be guardians of is growing exponentially. But our budgets are not growing exponentially.

COLE: That's a problem faced by many large companies with expanding archives - and the solution was right in front of the researchers. They worked with it every day.

GOLDMAN: We realized that that DNA itself is a really efficient way of storing information.

COLE: That's right. DNA, the genetic material that makes us us - is a natural hard drive. Here's why. It's a long chain that repeats four basic chemical units.

GOLDMAN: Four different bases - that's different forms of molecules - A, C, G and T.

COLE: Those are the four letters in DNA's alphabet.

COMPUTER VOICE: A, C, G. T.

COLE: When these letters are arranged in different ways, they spell out different instructions for our cells.

: A, G, A, C...

COLE: Three billion of those letters make up the entire instruction manual for our existence. And it's all stuffed into each cell in your body. DNA is millions of times more compact than the hard drive on your computer.

GOLDMAN: If only we could persuade it to take the form we wanted, encoding the information we defined.

COLE: Like a text file instead of genetic information. Over a second beer, Goldman and his colleague started to sketch out the details. They started with a text file of one of Shakespeare's sonnets.

UNIDENTIFIED MAN: Shall I compare thee to a summer's day?

COLE: This text file was written in a computer's most basic language.

GOLDMAN: Zeroes and ones.

COLE: Bits stored on a magnetic hard drive.

GOLDMAN: And some of these...

: Zero, zero, zero, one, zero, zero...

COLE: With a simple cipher, Goldman and his colleagues translated these zeroes and ones into the letters of DNA.

: C, G, C, A, G, A...

COLE: And then they did the same for the rest of Shakespeare's sonnets, and an audio clip of Martin Luther King's "I Have A Dream" speech, and a picture of their office. They sent that code - those strings of A's C's G's and T's - off to a company that built the physical strands of synthetic DNA and sent them back to Goldman.

GOLDMAN: My first reaction was that they hadn't done it properly because they sent me these little tiny test tubes that were quite clearly empty.

COLE: But the DNA was there - tiny specks at the bottom of the tubes. They sequenced the DNA, read the code, ran their cipher backwards...

: (Unintelligible)

COLE: And they ended up with a 100 percent accurate Shakespearean sonnet.

UNIDENTIFIED MAN: So long lives this and this gives life to thee.

COLE: All from the tiniest speck of DNA. They published their results in the journal "Nature," joining other groups who have experimented with DNA storage.

Goldman says the process would be easy to scale up. If you took everything human beings have ever written - an estimated 50 billion megabytes of text - and stored it in DNA, that DNA would still weigh less than a granola bar.

GOLDMAN: There's no problem with holding a lot information in DNA. The problem is paying for doing that.

COLE: The process would cost more than $10,000 per megabyte.

GOLDMAN: It's an unthinkably large amount of money at the moment.

COLE: At the moment.

Goldman and other scientists who are dabbling in DNA storage know that DNA synthesis costs are dropping rapidly. In a decade or so, a DNA archive might be cheaper than a room full of hard drives.

Adam Cole, NPR News. Transcript provided by NPR, Copyright NPR.

Related program: