Our project aims to tackle the growing need for a better, more energy-efficient data storage medium compared to current magnetic and optical data storage options by means of synthetic biology. Currently, we aim to achieve this through 2 separate tracks:
Developing an enzymatic DNA synthesis platform that can elongate a single-stranded DNA (ssDNA) in a template-independent manner. The synthesized ssDNA strand will then be converted to a more stable, double-stranded DNA (dsDNA) and inserted into a plasmid for long-term data storage.
Developing a data encoding/decoding pipeline that allows binary files (used by computers) to be stored in a ternary format compatible with our DNA synthesis platform, retrieved, and converted back into binary.
Implement a barebones pipeline, and see how much error can be tolerated in 100 nucleotide long DNA sequences with in silico testing.
Redefine algorithms to tolerate up to 30% error in 100 nucleotide long DNA sequences, with in silico testing.
How? Using a genetic algorithm.
Primer design fits this description!
Many encoding formats exist:
These encoding methods will ultimately be tested in silico...
Some limitations...
Blocking data
To encode the UBC iGEM sponsorship package...
For the first iteration, we want to see what percentage of deletion errors we can correct for with minimal error correction. Some ways of reducing rate of deletion errors or preventing deletion errors include that we will explore in further DBTLs are:
HEDGES error-correcting codes: Why?
Redundacy is enforced in the encoding strategy.
Press, W. H., Hawkins, J. A., Jones, S. K., Schaub, J. M., & Finkelstein, I. J. (2020). HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints. Proceedings of the National Academy of Sciences, 117(31), 18489–18496. https://doi.org/10.1073/pnas.2004821117HEDGES uses a hash function to encode redundancy and generate the base to be synthesized.
What's a hash function?
We will also use a established checksum algorithm to generate a checksum to signal if error correction is needed.
"Hashing each bit value withits strand ID, bit index, and a few previous bits “poisons” bad decoding hypotheses, allowing for correction of indels."
"In summary, the algorithm encodes information as a stream of nucleotides such that any single decoding error in either nucleotide identity or nucleotide position will “poison” the downstream predictions. Thus, on decoding, there will be onlyone good-scoring chain of guesses—the correct one."
Press, W. H., Hawkins, J. A., Jones, S. K., Schaub, J. M., & Finkelstein, I. J. (2020). HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints. Proceedings of the National Academy of Sciences, 117(31), 18489–18496. https://doi.org/10.1073/pnas.2004821117we will most likely use either (or combination of):
Why implement de novo assembly?
greedy graph search
How does this work?
Press, W. H., Hawkins, J. A., Jones, S. K., Schaub, J. M., &
Finkelstein, I. J. (2020). HEDGES error-correcting code for DNA
storage corrects indels and allows sequence constraints.
Proceedings of the National Academy of Sciences,
117(31), 18489–18496.
https://doi.org/10.1073/pnas.2004821117
Why?
Implement DNA Storage Alliance specifications, and do in silico testing on DNA sequences with 1000s of nucleotides.
Test our software on sequences synthesized by wet lab, and redefine algorithms with in silico testing and wet lab data.
"Unlike traditional storage media such as tape, HDD, and SSD, DNA does not have a fixed physical structure, a built-in controller, or a way to address different regions of the media linearly, and thus needs a mechanism to start reading or “booting up” a DNA archive that does not rely on such a structure. The SNIA DNA Archive Rosetta Stone (DARS) working group, one of four working groups in the DNA Data Storage Alliance aimed at defining standards for DNA data storage systems, has developed two specifications to enable archive readers to find the sequence to begin booting up the data."
https://www.snia.org/news_events/newsroom/dna-data-storage-alliance-releases-its-first-specificationsCorrecting Insertions and Deletions in Short DNA sequences
Store shapes as mathematical equations rather than individual pixels. Further compressible through traditional mechanisms or aforementioned text compression mechanisms.
Designed to store redundant information, enabling extreme error correction. Can be efficiently stored in many formats, including SVG or PNG.