Data Storage Crisis: DNA to the Rescue

Episode Summary

You might be surprised to learn that data storage currently requires huge amounts of land and energy, and we're running out of both. In this episode, we speak with a small group of researchers who are working to revolutionize the way we store the massive amounts of data we produce every day. Their solution: use DNA.

Episode Notes

You might be surprised to learn that data storage currently requires huge amounts of land and energy, and we're running out of both. In this episode, we speak with a small group of researchers who are working to revolutionize the way we store the massive amounts of data we produce every day. Their solution: use DNA.


Episode Transcription

Speaker 1 (00:08):

Need more space on your iPhone or iPad? Here are a few tips.

Speaker 2 (00:12):

You may have filled up all the space on your phone. Ah, that's so frustrating. I've got some great tip.

Audio (00:19):

[inaudible 00:00:19].

Dr. Michelle McMurry-Heath (00:26):

We've all been there. You try to share an important email, save a file, or just take a picture and you get the dreaded storage full notification. Usually this means buying more storage space in the cloud, but what if there isn't more space? You might be surprised to learn that data storage currently requires huge amounts of land and energy, and we're running out of both. Today, we'll hear from a small group of researchers who are working to revolutionize the way we store the massive amounts of data we produce every day. Their solution: use DNA. Yes, DNA to store data more sustainably. Same information, different format. This is a story about the intersection of two disruptive technologies: synthetic biology and computer science, working together to address a growing global problem. I'm your host, Dr. Michelle McMurry-Heath, and you are listening to I am BIO.

Kyle Tomek (01:54):

I think anyone with a smartphone can relate to running out of storage space. You have to delete apps or photos to make room for new ones, and your cloud storage provider will keep persisting you to upgrade and pay more for your storage. So this is something that people are feeling on a personal level, and this is also happening at these data storage centers across the globe.

Dr. Michelle McMurry-Heath (02:16):

This is Kyle Tomek, co-founder and CEO of Denali Data Technologies, a startup focused on DNA-based solutions for archival data storage. Located at North Carolina State University,

Kyle Tomek (02:29):

The world is really creating an insane amount of data. This is in the form of personal medical records and genomic data, financial transactions, and even the more important things like TikToks and cat videos. Basically everything we do nowadays creates some amount of data. Some of these files will end up on tape reels in data storage centers across the country and globe. Since more data is actually being created than can be saved currently, most of this data won't ever be saved or will eventually need to be deleted. In fact, to keep pace with how much data we're creating on a daily and yearly basis, the entire surface of the planet will need to be covered in data storage centers by the year 2060 and that is to store all of the data being generated between now and then. Let me say that again; the entire surface of the planet will need to be covered in data storage centers by the year 2060 to save as much data as we're creating between now and then,

Dr. Michelle McMurry-Heath (03:28):

I know what you might be asking, doesn't the cloud give us all the space we need? Here's Kyle again.

Kyle Tomek (03:35):

Cloud storage is where a single file can be saved in multiple locations at data storage centers across the country and globe. So it's a physical storage of an individual files maybe spread across different storage solutions. So the cloud is data storage solution that allows users to offload their data storage to a third party service. Think Google, Amazon, Microsoft, they're able to store personal individual files on multiple servers, multiple hard drives across their network, and that is technically the cloud storage service.

Dr. Michelle McMurry-Heath (04:16):

In other words, the term cloud may be a misnomer. Data does not sit in an unlimited sky above us and our ability to store new data is not keeping pace with demand as our next guest explains.

Jeff Nivala (04:31):

I'm Jeff Nivala, a research assistant professor at the University of Washington in the Paul G. Allen School of Computer Science and Engineering, and I'm a co-director of the Molecular Information Systems Lab or MISL for short. These big warehouses are taking up a large amount of physical space and they're also spending lots of energy and resources, for example, lots of water that is needed to cool these data centers down because they generate lots of heat, all these electronics packed into these warehouses. So to give you a little bit more perspective, today, global data centers use more energy than the entire United Kingdom, and by 2025 projected to use more than 20% of our global energy. The greenhouse gas is emitted by this data consumption and storage is already rivaling the aviation industry. It's just going to keep growing.

Dr. Michelle McMurry-Heath (05:21):

Jeff talked to us about the practical implications of the increasing demand for data storage.

Jeff Nivala (05:28):

Practically speaking, one of the main effects to the average person is just going to be that the cost of storing data is going to go up. It's really simple supply and demand. The demand for storing data is going up exponentially, but the supply we have, the ability that we have to store this is getting more and more challenging. The capacity that we have to store the data is not keeping up with this demand. Everything nowadays is generating data. It's going to cost a bank more to store your transactional data. It's going to cost a hospital more to store your medical data and ultimately these costs are going to affect the average person in the form of higher prices across the board. Even right now, you might be making tough choices. Do you really need to save these 10,000 pictures that you've taken of your cat fluffy? Do I need to keep storing these on the cloud now that it's costing me a hundred dollars a month to store this information?

Dr. Michelle McMurry-Heath (06:16):

Clearly the amount of data we are creating is increasing exponentially. We simply won't be able to keep up. Enter DNA.

Emily Leproust (06:26):

DNA data storage, the idea came from a Richard Feynman in 1959 before you could actually write DNA, right? Yeah, the idea of storing data in DNA before you could even write DNA. So definitely the idea has been around for a long time.

Dr. Michelle McMurry-Heath (06:41):

This is the CEO and co-founder of Twist Bioscience, a company that specializes in sequencing and printing DNA.

Emily Leproust (06:48):

Hi, my name is Emily Leproust. I am the CEO and co-founder of Twist Bioscience. In Twist, we write DNA from scratch. You may have heard of DNA reading, the sequencing, we do the DNA writing part of that. When we started Twist, it was a platform based on silicon technology and data storage was always the long term play of Twist. But now we've started nine years ago and what used to be long term is not even medium term, now it is a short term play for us and we are about to launch our first product in data storage.

Dr. Michelle McMurry-Heath (07:23):

Before Emily's company can create synthetic DNA for storing everything from family photos to medical records to government archives, this electronic data needs to be converted into the A's, T's, C's and G's that make up DNA. Jeff Nivala explains.

Jeff Nivala (07:41):

We can start with just the concept of digital information. That is any kind of information, whether it be text, photos, and we can represent that in what's called a binary code, which is essentially just a long string of zeros and ones, okay? And that's sort of how it's currently represented at the most basic form within a traditional computing system. We can take these same symbols, these bits, characters of zeros and ones, and we can represent them as a different set of symbols, specifically the A's, T's, G's and C's that is the molecules or the bases that make up strings of a DNA molecule.

Jeff Nivala (08:18):

All organisms on life are using the same sequences of bases that we know as DNA. And so we can map our digital data into these sequences of A's, T's and G's and this can represent now the text or the photo that you might be trying to store. And we can encode this on a computer in electronic form, and then once we've encoded these digital information into these sequences of DNA bases, we can then use machines to physically synthesize this DNA in molecular form. This information now exists within these strands of DNA molecules and this can be stored and you can store this in a safe location.

Dr. Michelle McMurry-Heath (08:53):

Kyle Tomek painted a picture for us of the benefits of using DNA storage.

Kyle Tomek (08:58):

The benefits of using the synthetic DNA are kind of mind blowing and staggering. DNA is roughly a hundred thousand times more dense than the current media being used to store data. If you picture yourself holding a milk jug and we're going to fill that milk jug up with DNA. Now, that amount of information would need to be stored in 150 Olympic size swimming pools using the best possible technology that's out there today being used in these data storage centers. So one gallon of DNA versus 150 Olympic size swimming pools of current technology. DNA on top of its density is also much more durable than this conventional media, which are only really guaranteed for about a decade or so before needing to be replaced. We've seen scientists that have found intact DNA stored inside a million year old woolly mammoth fossil, and so this is something that's going to last for decades and centuries at a time.

Kyle Tomek (09:58):

On top of all of that, humans are DNA based life forms and so DNA drives will never go out of style or need to be updated. Something like a floppy disc or a CD, we've seen those come and go, but DNA is going to be here to stay. Then with all of that in mind, imagine the capital that can be saved when companies won't have to invest into these massive storage facilities that need to be climate controlled, they consume a ton of energy and imagine the carbon footprint that we're going to be able to reduce when we think about how much energy and resources go into building and maintaining these facilities. And so DNA is really here to save the planet.

Dr. Michelle McMurry-Heath (10:38):

Emily describes how her company, Twist Bioscience, creates synthetic DNA.

Emily Leproust (10:44):

Every species on the planet shares the same A, C, G and T, so it's really fundamental to life on earth. Now, you can buy a bottle of A, a bottle of C, a bottle of G, a bottle of C, and what we do is assemble. We print one of those later on top of another on our silicon chip. So what you can imagine is each of those A, C, G, T's is a piece of Lego of four different colors and we build a very tall tower of A, C, G, and T's on top of each other. And on the silicon chip we have a million locations where we can print a million different towers. So on the same chip at the same time, we can synthesize, we can print, we can write 1 million different sequences of DNA. And the DNA is synthetic DNA because we've made it from scratch.

Dr. Michelle McMurry-Heath (11:46):

When we come back from the break, we'll dive deeper into the long term implications for this new exciting field and talk about how the data is retrieved once it's stored in DNA. Are you interested in hearing more fascinating stories like this one on DNA data storage? Check out bio.news. Bio.news is a daily news website exploring the intersection of biotech innovation and US and international policy. With new content daily, biotech.news has you covered on the latest biotech. Visit now by typing in bio.news into your web browser. Think about it, today, archeologists can analyze DNA from the fossils of dinosaurs after millions of years despite extreme conditions. That's because with optimal settings, DNA's capacity to maintain its integrity is unmatched by current data storage technologies.

Emily Leproust (13:06):

The DNA's going to be stable enough such as a few hundred years later when the next civilization is able to recover the ability to read DNA, they'll be able to read that data that has been stored for now hundreds of years or millennials. The big difference about those data centers versus current data centers is they'll be much smaller. You could saw hundreds of Google data centers in Sugar Cube. The space needed to store vast amount of data is actually much, much smaller. The other advantages to DNA, because it's stable, yet we don't need to use energy to keep the data over time, it's also a lot more sustainable. 3% of the electrical grid in California is used for data centers. That's just keeping the hard drive cool and so on. So if you can eliminate that, it will be more sustainable.

Dr. Michelle McMurry-Heath (14:03):

Because DNA is the basis of all life, the technology to read and analyze DNA will never become obsolete.

Emily Leproust (14:10):

Just over a few decades, it becomes very hard to find the machine to read the data. You can find a VHS tape today, but getting a VHS reader, it's difficult. In the not so distant future, having a CD player is going to be difficult as well. Although the good news with DNA is that DNA is so important to human health that as a society we'll always have to read DNA. It becomes a very universal format because the DNA that is stored in data is very similar to the DNA that we have in our own bodies. For the foreseeable, you can imagine that you'll always be able to read DNA.

Dr. Michelle McMurry-Heath (14:49):

But what about unintended consequences? By storing data in DNA, could we unintentionally create a living organism? We asked all of our guests to comment on this scary proposition.

Kyle Tomek (15:10):

This is probably the most common question that I get when I explain what I do for a living. People are really worried that we're going to be using their DNA to store information or some sort of Frankenstein DNA. But really the DNA that we're using is very specifically designed for storing these pictures, gifts, computer files. In other words, that means it's fake DNA. It's not human, it's not animal, it's not viral DNA, it's completely fake and synthetic DNA. Throughout the process, we also take safety measures to make sure that we're not accidentally making something that could cause an issue. We design these sequences and create them into DNA, we also then cross reference those sequences to a known biological database of DNA sequences to make sure that we never allow anything that can come close to looking like something that's biologically relevant.

Emily Leproust (16:04):

It's not living DNA, right? It's synthetic DNA. It's not living. It can't multiply itself. It can't produce itself. There's no biosecurity danger created by that DNA. So from that point of view, you can get your cake and eat it too.

Jeff Nivala (16:20):

So DNA by itself is essentially lifeless. The DNA that we're talking about here is completely what I would consider abiotic, synthesized biomachine, it's stored in a tube or in some container. It's not inside of a living cell. It's not living in any sense of the word, it really can't do anything. It can't replicate it by itself. It's not performing any functions other than just existing like any other non-living molecules. And we can also make sure that it's not encoding for anything biologically meaningful. Biology has its own code that it encodes into DNA that represents biological information that defines the processes of the operating principles of the cell. It recognizes a very specific code that is very different from the sequences or the codes that we use to store our digital data. And so even if you had the synthetic DNA molecules in which we've encoded digital data and you put them inside of a living cell, those molecules really wouldn't have any biological relevance or meaning.

Dr. Michelle McMurry-Heath (17:22):

We've talked a lot about converting data to DNA, but what is the physical form of the storage? Is it a vial or some sort of chip? How would people replicate and share? Here's Jeff again.

Jeff Nivala (17:36):

The actual hardware that people would use to physically store and carry this data and share it, I think that's an open question. It's certainly a area of active research that our lab has looked at as well as others, so there's various ways that you can encapsulate DNA molecules and predict them from an environment. So one way is to actually encapsulate it inside of glass and little glass pellets, or you could even embed it inside of glass panes or glass discs. And then to retrieve it, you'd essentially have to chemically remove the glass and then pull your DNA back out; that's one form.

Jeff Nivala (18:10):

Another one is storing it in calcium-like deposits. One thing that we've learned from biology is that we can extract DNA from bone that is millions of years old. From this, we've learned that the calcium within bones actually can help preserve DNA. So you could also imagine powder based storage where you have tiny little micros spheres composed of calcium as well as your DNA. Interesting question to think about what is going to be the best way to store it, and I think it will be kind of application specific. Is this just data that you're going to have locked away for the next million years or is this data that you're going to want to send to your friend?

Dr. Michelle McMurry-Heath (18:48):

DNA storage and retrieval are possible because of the advances made in DNA sequencing. And now companies are organizing to standardize both the language and the process so that this medium will truly be viable for all.

Emily Leproust (19:03):

Our strength is the writing of DNA and we think it's actually is the hardest thing. We are working with other companies to build an ecosystem because you need software up front to anchor the zeros and ones into ACGT. You need to write the DNA, so that's going to be us. You're going to need devices to start the DNA; those already exist commercially. Then once you want the data back, you need to sequence it, and again, those machine exist. Last, you need a system that creates a data in data out.

Emily Leproust (19:37):

As an industry, actually we are organizing. We started the DNA Data Storage Alliance. At the beginning it was just four companies: Twist, Microsoft, Western Digital and Illumina, and now that has grown to more than 50 companies. So all the storage companies that you can think of are actually part of it, and it's actually getting such traction that now we're moving into SNIA, which is an organization that is an industry group for data storage. Within SNIA, we'll start to develop standards such that every companies would be working within the same standard. The partnership are getting built right now. It's a new field that is attracting all the current players in storage. It's quite exciting to be building the ecosystem at the same time as you build the technology.

Dr. Michelle McMurry-Heath (20:24):

The DNA Data Storage Alliance and Storage Networking Industry Association that Emily mentions are focused on identifying standards and specifications that will help advance the industry. For example, the groups can help ensure there's only one language to convert ones and zeros to DNA as opposed to the countless coding languages found on computer systems. And with big players like Microsoft investing in the technology, it begs the question, how close are we to DNA data storage products for consumers?

Emily Leproust (21:00):

The first product is what we call a century archive. We play in the archiving part of storage. If you think of storage as hot or cold, the data on your computer is hot data, you get in and out all the time very fast. It's important, but actually 70% of storage is cold data; it's archiving, it's data that is not read very often. There is a lot of people that want to store data for a long time, such as a hundred years. Today, if you want to store data for a hundred years, you have to constantly move from one hard drive to the next. Every five to seven years, you have to move your data. It's tedious, it adds cost and with DNA, our first product would be a century archive. You give us a data, we put it in DNA, and you don't have to do anything for a hundred years.

Kyle Tomek (21:52):

I think DNA data storage has the potential to completely upend the data storage market as it is now. You think about the scale of DNA and you can take an entire million square foot facility and put it into a drop of water. By adding a few drops of DNA to that facility, you can double, triple 10X the capacities without almost any capital needs or energy needs to maintain that DNA. That impact is pretty tangible to me. If I'm really dreaming about this, I really like to think about the potential for the iPhone 50 having a DNA drive in it and giving people on a personal level, access to all of the world's data right in their pocket. We would be able to fit all that information into the size of a phone, put it in someone's pocket, and as those editing tools and computational tools come to a fruition, we're able to democratize data storage and put it in the palm of people's hands.

Emily Leproust (22:58):

Still expensive compared to a hard drive. A hard drive is about a hundred dollars per terabyte, but it's already a thousand times cheaper than it used to be. And we are on a path to get to smaller dimensions like a hundred nanometers and our goal is to be price competitive with a hard drive. When you want to store data for a hundred years, when you look at the total cost of ownership of buying a hard drive and then another one five years later and moving the data and all the electricity that you need, when you compare the cost of ownership of storing data on a hard drive, it would be cost competitive to store it in DNA.

Dr. Michelle McMurry-Heath (23:35):

DNA storage might not be ready to replace the ever-convenient thumb drive, but we're certainly a lot closer to making it a reality than we've ever been before. As we marvel at one more example of the life sciences industry breaking through to address global challenges, we are reminded of the many societal benefits that are realized when biotechnology companies partner with other sectors like AI, computer science, and others. I want to thank our listeners for tuning into today's episode. Make sure to subscribe, rate and/or review this podcast and follow us on Twitter, Facebook, and LinkedIn at IAmBiotech. And subscribe to Good Day BIO at biotech.org/goodday. This episode was developed by executive producer Theresa Brady and producers Connor McKoy, Lynne Finnerty and Rob Gutnikoff. It was engineered and mixed by Jay Goodman, the music created by Luke Smith and Sam Brady.