Scientists at the European Bioinformatics Institute (EBI) have developed a new method of encoding information into DNA that could turn the blueprint of life into a viable data storage medium. The team demonstrated their method in an experiment published in Nature on Jan. 23.
“I think it’s a really important milestone,” said George Church, a Harvard molecular geneticist who worked on a similar experiment in the past, “We have a real field now.”
DNA’s potential use as a storage medium was first demonstrated in 1988, but the previous experiments were too costly and small-scale to have any immediate use. That changed last year, when Church’s team took binary data (strings of ones and zeroes) and used a straightforward code to write it with DNA bases: A and C represent zero, and T and G represent one. This code worked, but it led to long strings of the same letter, which can cause errors because it’s easy to misread.
This new experiment, led by Nick Goldman of EBI, tried a different code. Each byte was converted into five or six base-3 digits (known as trits). Each trit was represented in the DNA with one of the three DNA bases different from the previous one used, meaning there are no repetitions in the code. To demonstrate the flexibility of DNA, the team encoded a variety of data: a text file containing all of Shakespeare’s sonnets, an audio clip from Martin Luther King’s “I have a dream” speech, a PDF of Watson and Crick’s paper on the structure of DNA, a JPEG photo of EBI’s building, and the Huffman code that they used to convert bytes to base-3 digits. To demonstrate the reliability and durability of DNA, they had the code synthesized by a company in California, which sent the DNA back to the UK in ordinary packaging and at ambient temperatures.
Goldman’s code also allows error checking. Because it is difficult to write DNA in long strands, both Church and Goldman synthesized short strands, each including addressing information that locates it in the overall sequence. Goldman’s code, however, has each strand overlapping with the previous one in such a way that all the data appears at least four times, drastically reducing the probability of errors. In the end, they were able to recreate the data with 100 per cent accuracy.
The main advantage of DNA is its durability.
“The experiment was done 60,000 years ago when a mammoth died and lay there in the ice,” said Goldman, referring to the discovery of readable DNA on a frozen woolly mammoth. “And those weren’t even carefully prepared samples.”
While hard drives need to be powered and are prone to crashing, and tape drives degrade and have to be replaced every few years, DNA should last for millennia under cold, dry conditions. It’s also space-efficient: 100 million hours of HD video would fit in a cup of DNA. Because DNA is so important in the life sciences, we will always need the ability to read it, so the format will never go obsolete.
Of course, it’s all prohibitively expensive. Goldman’s team estimates costs of $12,400 per megabyte (MB) to encode and $220 per MB to read data stored in DNA. It also took two weeks to read the data. But the costs of reading and writing DNA have been falling at an unprecedented rate, much faster than the cost of new electronic media. With the expected advances in technology, soon it will be practical for DNA to be stored in long-term archives that don’t need to be accessed frequently. You probably won’t have DNA in your laptop any time soon, but you could one day see it in your library.