Why Are Hard Drive Manufacturers Investing In DNA Data Storage?

Herbert Huffner Technology November 30, 2022

The Demand For Storing Digital Data Generated On The Internet Has Increased Exponentially.

Meanwhile, traditional storage media cannot store this amount of information.

This issue has made experts go for the DNA-based storage solution. A solution that allows a large amount of data to be stored in a tiny space.

ArsTechnica, which is famous for covering the daily news of the technology world, interviewed Hyunjoon Park, CEO of the company, to find out how close the technology of the “Catalog” company is to practicality.

In his interview, Park points out that the Catalog’s approach differs from what people think. So it doesn’t store data as everyone expects.

The research community is excited that DNA has the potential to be used as a tool for long-term storage and archiving of information; Because DNA is very dense and remains chemically stable for tens of thousands of years, and has a pattern that allows retrieving and writing data in a relatively simple way.

While exciting developments have been made in this field, due to the high cost and low speed of reading and writing, most of the efforts have remained in the same scientific research framework.

These are problems that must be solved before storage in DNA becomes practical. So, hearing the news that data storage giant Seagate has partnered with a DNA-based storage company called Catalog has surprised experts.

How does DNA data storage work?

DNA stands for deoxyribonucleic acid, a complex organic molecule that contains the genetic information of a living organism. DND is present in all organisms and stores information such as skin color, eye color, height, and other physical and biological characteristics.

A helical DNA sequence has multiple alternating pairs of four unique bases. These bases are adenine (A), guanine (G), cytosine (C), and thymine (T) and are attached to the DNA helix in pairs, which are called base pairs. These bases are in the form of two pairs of adenine-thymine and guanine-cytosine. In today’s storage mechanisms, data is stored as binary digits (1’s and 0’s). In DNA data storage, four nucleotide bases (A, C, G, T) store and encode data. Information is stored in permutations of three nucleotide games called codons.

DNA storage includes three processes: data encoding, synthesis and storage, and decoding. Binary codes contain information translated into DNA codes or codons using an algorithm. Next, they are placed in a container in a relaxed environment. In this case, DNA information can be frozen in solution and stored as droplets or on silicon chips.

The problem in this field is the low speed of this process.

This issue has made scientists look for a fast and low-cost solution to read the information stored in DNA. Data stored in DNA must be taken to a laboratory to be decoded into error-free binary information. This process is quite time-consuming. As such, it may take some time for DNA data storage devices to become inexpensive devices that the general public can use.

More research is being done on DNA storage technology, so it is not likely that biological processes will obsolete the current storage methods shortly. However, in the last few years, excellent research progress has been made on how to store data in DNA to solve the problems of storage space, stability, and precise deletion of data.

Different storage

DNA is a molecule that can be thought of as a linear array with four different chemical elements A, T, C, and G. We can use DNA to store two bits of information. So that A is the coder of 00, T is the coder of 01, C is the coder of 10, and G is the coder of 11. With this encoding, the molecule AA can store 0000, AC can store the value 0010, and so on.

We can synthesize long DNA molecules with high throughput and add flanking sequences equivalent to file system information. These sequences tell us which part of the binary data a single piece of DNA represents.

The problem with the above method is that the longer the string of bits you want to store, the more time and money it takes. The robotic hardware performs centrifugation reactions, and each hardware unit can synthesize only one DNA molecule at a time. In addition, the raw materials that the hardware uses for synthesis increase the cost per molecule that stores information.

In a situation where the cost issue is not a problem for small projects, but if we want to do storage on a large scale, the costs will go up quickly. Park says: “Assuming that the cost of each storage is 0.03 cents, if we want to store a volume of gigabytes, this cost will reach several million dollars, which is a huge figure.”

“The catalog company was founded to solve the problem of data encryption and reduce costs,” says Park.

The encoding process of this company is done with a library of tens to hundreds of short pieces of DNA called oligo (oligonucleotide). Each bit of data is then assigned a unique combination of oligos. You can think of it as a silicon processor that gives an individual 64-bit address to a bit in memory. If that bit is 1, a robot can collect small samples of solutions containing each of the required oligos and combine them with an enzyme that can link them together.

This enzyme integrates the oligos into a single, long DNA molecule that contains the unique signature of that bit. Conversely, if the bit is zero, the DNA corresponding to its address will not be synthesized.

Then, all the molecules produced can be combined into a single solution that can be dried for long-term storage. To read that data, the DNA molecules must be sequenced so that a unique combinatorial algorithm can identify the oligos present in each molecule.

Detected addresses are assigned a value of 1, and all others are given a value of 0.

This process digitally recovers the encrypted data. In the above method, the size of the molecules remains small so that the storage environment remains stable and compact. Due to its fundamental asymmetry, this system significantly saves time and money. So, synthesizing large amounts of a specific DNA sequence is much cheaper than synthesizing small quantities of different DNA sequences.

DNA assembly based on small amounts of pre-made DNA significantly reduces the cost of synthesis. In such a situation, each assembly reaction can be executed in parallel. Of course, the problem with the above method is that during the time of synthesizing the sequences individually, the device is involved until the synthesis process is complete and cannot do other work simultaneously.

Reluctance to archive

“In its latest concept, the catalog company has built a device called the Shannon, named after information theorist Claude Shannon, based on inkjet technology,” says Park. Each jet can print an oligo within a droplet on a continuous film sheet. Different oligos land on the same reaction spot, and we reprint it with a tiny drop of the enzyme, and that film goes into the incubator.

Next, the enzyme assembles them in the form of a DNA molecule. “When the reactions are complete, the droplets can be combined into a single solution containing all the encrypted data.”

Part of Catalog’s partnership with Seagate includes investigating whether some of the fluidic hardware the hard drive maker has developed can help shrink and further automate the process, reducing energy and resource use. ?

“The Shannon is about the size of a typical kitchen,” says Park.

Shannon’s output is set for archiving, while surveys conducted by the Catalog show that customers have little interest in archiving information. “We’ve talked to companies like Seagate and entertainment, energy, and technology companies,” says Park.

Companies are facing many problems storing and maintaining a considerable amount of information. Our research showed that it’s not just the cold storage aspect of the job that interests them; they’re looking for technology that allows data to be read and written at optimal speeds.

We found that people were interested in whether DNA could enable massively parallel operations on stored data without converting it to digital form.

“This coding scheme can give us a great ability to do some operations on DND because now we have more detailed information about how the data is stored and the structure of the molecules,” Park said. Something that is not possible in encryption schemes where the sequence of molecules varies based on the stored data.

Also, the absence of specific lines in this encoding scheme can be helpful.

However, at this point, the Catalog is still considering how to implement some of the ideas. “Achieving some computational advantages may not be possible anytime soon, as the output of some ideas will only be cost-effective when commercialized.”

“Before DNA-based computing can be understood and meaningful, it must be possible to store large amounts of information in DNA,” says Park. “My prediction is that DNA storage will eventually gain its footing because it performs well in massively parallel computing.”

While a startup like Catalog went to the big companies of the information technology world and is negotiating with different companies, the first concrete achievements in this field may be introduced from the academic community to the technology world.

Park points to the massive amount of data generated by the Large Hadron Collider as a potential target and says the company has adopted the Open Labs technology development framework catalog run by the European Organization for Nuclear Research.

“I think DNA is a great way to store huge amounts of data,” Park says. When a new theory is presented, you most likely want to search and check all previous experiments efficiently. There is currently no way to do this, and I think a DNA-based system would be a great solution to this problem.”

What is the storage capacity of DNA data?

DNA data storage is a preferred solution to the shortage problem because it can store large amounts of data in very little space. One gram of DNA can store 215 petabytes of data. A petabyte is equal to 1024 terabytes. Therefore, one gram of DNA can store approximately 220,160 terabytes of information. Comparing it with current technology, a 1 TB hard disk weighs about 400 grams. Therefore, to keep data equivalent to what is stored in one gram of DNA, we need more than 88 million grams of the hard disk!

According to this information, researchers say that by using a DNA-based data storage mechanism, all the data in the world can be kept in a shoe box.

What are the advantages of storing DNA?

Using a DNA-based storage mechanism as a storage solution has many advantages over digital storage. These advantages include high capacity in data storage, longer life than today’s storage technologies, compactness, less susceptibility to technical and electrical failures, and repeatability.

Storage density

The main advantage of DNA storage over other storage media is storage density. If you store your data remotely on the cloud or NAS, it is still held in large data centers and servers. These data centers are the size of football stadiums and cost billions of dollars to build and maintain. This is not the case with DNA-based data storage. DNA storage allows you to store a massive amount of data in a very compact space. Therefore, the problems of space, maintenance cost, and lack of storage equipment are reduced.

Durability

Digital storage equipment available today is not durable at all. All of them are prone to breakdown and damage. Digital corruption means the gradual deterioration of data stored in a computer, which causes significant harm to individuals and companies. Data stored in DNA can be accessed for hundreds of years when kept in an optimal environment. This is even the DNA having a life of at least 500 years.

Data stored in DNA can be easily replicated. Meanwhile, data centers are forced to make backup copies of information at different intervals and keep it on other hardware for fear of data destruction. A process that is difficult and expensive. One of the identified methods in this field is to insert DNA with stored information into a bacterium.

Is storing information in DNA the envisioned future for data storage?

To be honest, yes. Currently, storing information in DNA is used by companies that want to maintain extensive archives of information that do not require regular access. Storing data in DNA solves all storage-related problems.

Unfortunately, it will take a long time for DNA storage to become a mainstream and cost-effective storage option. During this time, we should choose the best long-term data storage format.

blog posts