Science AMA Series: I'm Yaniv Erlich; my team used DNA as a hard-drive to store a full operating system, movie, computer virus, and a gift card. I am also the creator of DNA.Land. Soon, I'll be the Chief Science Officer of MyHeritage, one of the largest genetic genealogy companies. Ask me anything!

Abstract

[removed]

How exactly do you write computer code into Dna?

ShiningComet

Yaniv here.

Great question. @Parazeit's answer below hinted towards the method that we used. The main thing to keep in mind is that computer code is just a binary data and generally looks like many other types of data (e.g. video). The idea is to map the 0s and 1s in the binary file into the four DNA letters: A, C, G, T. Naively, one can just map 00 to A, 01 to C, 10 to G, and 11 to T. But the catch is that some DNA sequences are not desirable.

For example, the sequence 000000000... translates under this mapping to AAAAAAAA... but it is very hard to sequence and synthesize a DNA molecule like that for various biochemical reasons. Our DNA Fountain method avoids this problem. It fountain property means that we can represent parts of the file in virtually unlimited number of ways. We quickly sift over different representations, map them to DNA sequences, and only keep the sequences without the undesirable properties. Hope it helps.


Could you potentially embed information into a virus, and then transmit that virus as a covert means to send information? Infect a population to make sure your message gets through?

monkeydave

Yaniv is here. Theoretically speaking you could pack a little bit (probably <10Kbyte) of information on a virus (viruses pose a limitation on the amount of DNA they can pack due to the small size of the capsid). However, our study is about synthetic DNA that was not derived or placed in any organism.

Also viruses mutate as they propagate through the population which will reduce the ability to "transmit" the information correctly. Probably a much easier way to transmit is to fedex the sample (or send it via drone in the future).


How long before I can literally have a thumb drive?

Caddy666

Yaniv is here.

If you are willing to put the money, you can have kind of DIY thumb drive in two weeks. You can use our software (free!) to encode any data on DNA: https://github.com/TeamErlich/dna-fountain

Then, send the results to Twist Biosciences (not free; >$1000) and in two weeks you will get a DNA in a test tube which you can carry with you. When you want to read the file, contact any sequencing provider (e.g. NY Genome Center) and send the sample.


How long before I can literally have a thumb drive?

Caddy666

Dina here. Storing data on DNA would more likely replace server farms, at least in the short term. If you store data in the cloud for example, it would be in DNA in freezers and you may not necessarily know that this is the case when you access it.


Hi Yaniv,

How does the dna interface with a regular, transistor based cpu? How long does it take to access compared to a) a normal hard drive b) an SSD?

Thank you for doing this ama!

Korla_Plankton

Yaniv is here. Thanks for this great question. Currently, we read the DNA using a regular sequencer (Illumina platform) that consists of a giant microscope that converts optical signals from the DNA into TIFF, which are then read by fast image processing to extract the nucleotide. Our DNA Fountain software convert the nucleotide to back to binary.

So the current I/O is much more cumbersome than a fancy USB stick. My colleagues at Urbana-Champaign developed a DNA storage approach that can be read directly from a USB based sequencer. However, it currently works only for very small files. You can read more here (no paywall): http://www.biorxiv.org/content/early/2016/10/05/079442


What about the degradation of DNA? How do you stop it? How long can the data safely stay on there before it corrupts or is lost?

Woollywoo

Our colleagues from ETH Zurich did a test and found that the half life of DNA after a chemical treatment can be 4000 years in room temperature, much better than my CDs!


What OS did you write on/to it?

If it was GNU/linux, any specific distro or just the linux kernel?

What would the read (and if possible) write speeds be?

Do you see it as a viable backup storage medium?

munsking

Yaniv is here. We wrote KolibriOS to DNA: https://www.wikiwand.com/en/KolibriOS This system is graphical and was totally functional after decoding the data. I was even able to play minesweeper with the DNA-derived OS.

You could store linux but will need much more DNA synthesis that will make the project more expensive.

DNA might be a viable option is we can further reduce the costs.


When people "contribute" their personal DNA data what, if any, protections do they have against their own genes being either patented or copyrighted by a third party entity (such as a corporation)?

Will people in the future be subject to "copyright" or "trademark" infringement for natural reproduction if their genome contains trademarked, patented, or copyrighted genetic codes?

CicerosGhost

The US Supreme Court decided on June 2013 that genes cannot be patented! Also the Supreme Court postulated that DNA is information and to the best of my knowledge you cannot copyright information.

It is important to keep in mind that there are probably over five million people that took a DTC test in the last decade. Did not hear of anyone with copyrighted genome or trademarked genome. So don't think this is a real risk.


When people "contribute" their personal DNA data what, if any, protections do they have against their own genes being either patented or copyrighted by a third party entity (such as a corporation)?

Will people in the future be subject to "copyright" or "trademark" infringement for natural reproduction if their genome contains trademarked, patented, or copyrighted genetic codes?

CicerosGhost

Dina here. It is highly unlikely that genes will be patented. A recent example is the controversy over breast cancer associated (BRCA) genes. Naturally occurring DNA sequences cannot be patented but synthetic DNA could be.


What's the next step? How do you see this evolving as a technology?

ze_snail

Dina here. We showed that we can nearly reach the storage capacity using our method, with a density of 215 petabytes per gram of DNA. (1 petabyte = 1 million gigabytes). So the bottleneck to really putting DNA storage into practice is the cost of synthesizing the DNA.


What's the next step? How do you see this evolving as a technology?

ze_snail

Yaniv is here. Cost cost cost. We need to lower the synthesis costs by orders of magnitude to compete with hard drives.


What was your read and write rate? What room for improvement is there in these?

Robo-Connery

Yaniv is here. In terms of reading, we were able to perfectly decode the file from a density of 215Petabyte/gr, which is 100x better than previous studies with a similar file size.

For writing, we were able to organize the data in nearly a perfect way (i.e. close to the Shannon capacity) - about 60% better than previous studies with a similar file size.

Also we reported that we can create virtually unlimited number of copies to the file without sacrificing the accuracy of the data.


What was your read and write rate? What room for improvement is there in these?

Robo-Connery

Dina here. It's much faster and cheaper to read DNA than to write it. The turn-around for 72,000 unique oligos, each 200 nucleotides long was 2 weeks. The sequencing and transfer of the raw data was completed overnight. So, reducing synthesis costs would go a long way in making DNA storage feasible.


Thanks so much for doing this AMA, as may people are interested in this new concept. I do have a few questions.

  1. How far away (if at all) is this from the consumer market (public)?
  2. What kind of equipment was used?
  3. How did you verify the data was intact/read it back from the dna.
  4. What kind of dna was used?
  5. How much dna "space " did you take up with the operating system, video, virus, and gift card?
  6. How much dna "space" does 1 bit take?

Thanks again for the ama and I cant wait to read through all of your responses.

TrainerBoberts

Dina here. 1. The bottleneck right now is largely cost, particularly of synthesizing the DNA on which the data is encoded, but could become feasible in a decade or so. 2. The sequencing was done on the standard Illumina MiSeq platform. 3. As part of the decoding process, going from DNA back to the original files, we can detect erroneous sequences and simply need to collect enough correct sequences until we can infer the original input data.
4. We used synthetic DNA. You can send a synthesis company a file with sequences and they send it back in a few days to a few weeks. 5. We encoded a total of ~2 Mb. 6. The information capacity is ~1.8 bits per nucleotide. (theoretically 2 since there are 4 bases, but there are practical limits to the capacity)


Is this technology expected to be write only once, read forever? Like a backup technology? Or can it add, remove and modify data?

Gone2theDogs

Dina here. We envision long term storage on DNA. Each time the data is accessed, it needs to be sequenced. To modify or add data would require synthesizing new DNA.


I've thought about doing various things with my DNA, such as the Ancestry.com thing where they tell you what makes up "you". The reason I haven't gone through with it is that the privacy policies tend to be lacking in answers that I find critical. What kind of privacy policies do you intend to have with DNA.Land/MyHeritage, and how do you intend to uphold it? For example, I'm sure you'll be keeping data on everyone who submits information.. will you anonymize it?

Post-answer edit: Yep, sounds about like everyone else's idea of "privacy" - no real answer. I'm sure you'll have plenty of clients. Unfortunately, I won't be one of them.

Mafiya_chlenom_K

Yanvi is here. Very good questions from you and t00 (below).

In short, all DNA data that MyHeritage (MH) collects is stored on secure servers in the US (similar to other DTC companies). The privacy and autonomy of users is highly important. This is the reason why we have a detailed policy on the DNA page and you can also opt-in whether you want to participate in research or not.

For t00 question, I am not a legal expert so cannot answer your question well. But please keep in mind that generally speaking the format of our data is not compatible with traditional forensic analysis. Law enforcement agencies (either US or non-US) use the CODIS set that is not represented on any of the DTC arrays. This limitation already creates a technical barrier and reduces the utility of the data stored in DTC servers for law enforcement activities.


How fast is it to transfer data to DNA and back again, how fast do you think it feasibly can be?

Laikitu

Yaniv is here.

Synthesis and shipment are currently the slowest part. They took two weeks to be completed. However, we envision that this can be further optimized as the current supply chain is mainly for applications that are largely indifferent for the turn-around time (e.g. regular experiments with synthetic DNA).


Where do you get the DNA to use for data storage?

Bicuspids

Yaniv is here. The DNA is synthesized in a pure chemical reaction called "Synthesis by the phosphoramidite method". See: https://www.wikiwand.com/en/Oligonucleotide_synthesis

It is not derived from any organism just a sophisticated biochemical method to generate chains of DNA nucleotides (the building blocks of DNA molecules). Some companies use devices that look like ink jet printers.


Where do you get the DNA to use for data storage?

Bicuspids

Dina here. The DNA is entirely synthetic. After we encoded the data and converted the 0s and 1s to A,T,C,G, we sent a list of these 200 base long strings to a company. They 'wrote' the DNA and sent back a single tube in ~2 weeks.


What are some cool DNA projects you guys are planning on doing?

MrPankow

Yaniv is here. We have many ideas but the most important one is to work with other researchers to reduce the costs of DNA synthesis. Thanks for asking!


Does exposure to strong magnetic fields wipe the data?

Outlierist

Yaniv is here. Nice question. DNA is not affected by magnetic fields. The only way to wipe the data is to break the molecules or to mutate the nucleotides (but we also have a strong error correcting code that can take care of that).


What sorts of operational lifetimes could we expect from organic based storage and what sort of engineering limitations would need to be put in place to increase the viability of this as a storage medium (ie temperature limitations, read/write speeds, etc)?

Partyatmyplace13

Dina here. DNA is incredibly robust and can be stored in a cold, dry place for hundreds of thousands of years. In terms of reading and writing, sequencing (ie. reading) costs continue to drop but writing the DNA is still quite expensive.


What would be the viable operating temperatures of a storage system based on DNA? For regular DDR2,3,4 RAM the maximum safe operating temperature seems to be around 80-85C

mmsood99

ETH Zurich found that you can keep DNA storage in 60C for a week and still get the data back. Also, as part of the reading reaction, we heat the DNA to 98C for about 30sec for brief ten cycles (PCR reaction). We can still read the DNA after that.


how much time does it take to convert the data back to binary? what are the write/read speed as seen from a convetional CPU?

Thanks for doing the AMA.

nkr3

Yaniv is here.

Sequencing takes about 24hours. Then, there are a few pre-processing steps to organize the sequencing data.

The actual conversation of sequencing data to binary took 9min (decode 2.1Mbyte) using my not highly optimized Python script. I imagine that 100x faster time can be achieved using C/C++ and much better software engineering.

The software is here if you want to play with: https://github.com/TeamErlich/dna-fountain


Does the retrieval destroy the dna?

altered-state

Yaniv is here. Excellent question. Retrieval does destroy a small aliquot of the DNA sample. We were concerned about this issue and tested a molecular approach (based on PCR) to copy the data and copy the copy and copy the copy of the copy and copy the ... We were able to accurately get back the data despite extensive copying, which addresses this issue.


How complex is the fabrication process to create your DNA 'hard drives' so others can create their own versions? Who do you see as the first users of this tool outside of the laboratory?

Inform2015

Yaniv is here. The setting is fairly complicated but luckily you do not need to think about it. Twist Biosciences and other companies (e.g. Customarrays) offer DNA synthesis as a service. You can simply purchase the DNA from them and not worry about setting your own synthesis lab.


Additional Assets

Reviews

License

This article and its reviews are distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and redistribution in any medium, provided that the original author and source are credited.