Analysis and review of the possibility of using the generative model as a compression technique in DNA data storage: review and future research agenda

(1) * Muhammad Rafi Muttaqin Mail (IPB University, Indonesia)
(2) Yeni Herdiyeni Mail (IPB University, Indonesia)
(3) Agus Buono Mail (IPB University, Indonesia)
(4) Karlisa Priandana Mail (IPB University, Indonesia)
(5) Iskandar Zulkarnaen Siregar Mail (IPB University, Indonesia)
*corresponding author

Abstract


The amount of data in this world is getting higher, and overwriting technology also has severe challenges. Data growth is expected to grow to 175 ZB by 2025. Data storage technology in DNA is an alternative technology with potential in information storage, mainly digital data. One of the stages of storing information on DNA is synthesis. This synthesis process costs very high, so it is necessary to integrate compression techniques for digital data to minimize the costs incurred. One of the models used in compression techniques is the generative model. This paper aims to see if compression using this generative model allows it to be integrated into data storage methods on DNA. To this end, we have conducted a Systematic Literature Review using the PRISMA method in selecting papers. We took the source of the papers from four leading databases and other additional databases. Out of 2440 papers, we finally decided on 34 primary papers for detailed analysis. This systematic literature review (SLR) presents and categorizes based on research questions, namely discussing machine learning methods applied in DNA storage, identifying compression techniques for DNA storage, knowing the role of deep learning in the compression process for DNA storage, knowing how generative models are associated with deep learning, knowing how generative models are applied in the compression process, and knowing latent space can be formed. The study highlights open problems that need to be solved and provides an identified research direction.

Keywords


DNA Data Storage; Generative Model; Compression; Deep Learning; Latent Space

   

DOI

https://doi.org/10.26555/ijain.v9i3.1063
      

Article metrics

Abstract views : 766 | PDF views : 157

   

Cite

   

Full Text

Download

References


[1] U. J. Lee, S. Hwang, K. E. Kim, and M. Kim, “DNA Data Storage in Perl,” Biotechnol. Bioprocess Eng., vol. 25, no. 4, pp. 607–615, Aug. 2020, doi: 10.1007/s12257-020-0022-9.

[2] A. Doricchi et al., “Emerging Approaches to DNA Data Storage: Challenges and Prospects,” ACS Nano, vol. 16, no. 11, pp. 17552–17571, Nov. 2022, doi: 10.1021/acsnano.2c06748.

[3] IDC, Seagate, and Statista estimates, “Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2025,” IDC; Statista, 2021. [Online]. Available: https://www.statista.com/statistics/871513/worldwide-data-created/#:~:text=The total amount of data,replicated reached a new high.

[4] V. Zhirnov, R. M. Zadegan, G. S. Sandhu, G. M. Church, and W. L. Hughes, “Nucleic acid memory,” Nat. Mater., vol. 15, no. 4, pp. 366–370, 2016, doi: 10.1038/nmat4594.

[5] M. Castillo, “From hard drives to flash drives to DNA drives,” Am. J. Neuroradiol., vol. 35, no. 1, pp. 1–2, 2014, doi: 10.3174/ajnr.A3482.

[6] Z. Ping et al., “Towards practical and robust DNA-based data archiving using the yin–yang codec system,” Nat. Comput. Sci., vol. 2, no. 4, pp. 234–242, Apr. 2022, doi: 10.1038/s43588-022-00231-2.

[7] A. C. Patel and C. G. Joshi, “Deoxyribonucleic Acid as a Tool for Digital Information Storage: An Overview,” Indian J. Vet. Sci. Biotechnol., vol. 15, no. 01, pp. 1–8, 2019, doi: 10.21887/ijvsbt.15.1.1.

[8] C. K. Lim, S. Nirantar, W. S. Yew, and C. L. Poh, “Novel Modalities in DNA Data Storage,” Trends Biotechnol., vol. 39, no. 10, pp. 990–1003, 2021, doi: 10.1016/j.tibtech.2020.12.008.

[9] Y. Hao et al., “Data Storage Based on DNA,” Small Struct., vol. 2, no. 2, p. 2000046, 2021, doi: 10.1002/sstr.202000046.

[10] N. Goldman et al., “Towards practical, high-capacity, low-maintenance information storage in synthesized DNA,” Nature, vol. 494, no. 7435, pp. 77–80, 2013, doi: 10.1038/nature11875.

[11] Y. Erlich and D. Zielinski, “DNA Fountain enables a robust and efficient storage architecture,” Science (80-. )., vol. 355, no. 6328, pp. 950–954, 2017, doi: 10.1126/science.aaj2038.

[12] Y. Zhang et al., “Information stored in nanoscale: Encoding data in a single DNA strand with Base64,” Nano Today, vol. 33, pp. 6–11, 2020, doi: 10.1016/j.nantod.2020.100871.

[13] L. Anavy, I. Vaknin, O. Atar, R. Amit, and Z. Yakhini, “Data storage in DNA with fewer synthesis cycles using composite DNA letters,” Nat. Biotechnol., vol. 37, no. 10, pp. 1229–1236, 2019, doi: 10.1038/s41587-019-0240-x.

[14] S. Newman et al., “High density DNA data storage library via dehydration with digital microfluidic retrieval,” Nat. Commun., vol. 10, no. 1, pp. 1–6, 2019, doi: 10.1038/s41467-019-09517-y.

[15] S. Kosuri and G. M. Church, “Large-scale de novo DNA synthesis: Technologies and applications,” Nat. Methods, vol. 11, no. 5, pp. 499–507, 2014, doi: 10.1038/nmeth.2918.

[16] H. H. Lee, R. Kalhor, N. Goela, J. Bolot, and G. M. Church, “Terminator-free template-independent enzymatic DNA synthesis for digital information storage,” Nat. Commun., vol. 10, no. 1, p. 2383, Jun. 2019, doi: 10.1038/s41467-019-10258-1.

[17] C. N. Takahashi, B. H. Nguyen, K. Strauss, and L. Ceze, “Demonstration of End-to-End Automation of DNA Data Storage,” Sci. Rep., vol. 9, no. 1, pp. 1–5, 2019, doi: 10.1038/s41598-019-41228-8.

[18] Y. Dong, F. Sun, Z. Ping, Q. Ouyang, and L. Qian, “DNA storage: Research landscape and future prospects,” Natl. Sci. Rev., vol. 7, no. 6, pp. 1092–1107, 2020, doi: 10.1093/nsr/nwaa007.

[19] L. Ceze, J. Nivala, and K. Strauss, “Molecular digital data storage using DNA,” Nat. Rev. Genet., vol. 20, no. 8, pp. 456–466, Aug. 2019, doi: 10.1038/s41576-019-0125-3.

[20] H. M. Yasin and A. M. Abdulazeez, “Image Compression Based on Deep Learning: A Review,” Asian J. Res. Comput. Sci., no. May, pp. 62–76, 2021, doi: 10.9734/ajrcos/2021/v8i130193.

[21] D. Foster, Generative Deep Learning, vol. 6, no. November, p. 308 2019. [Online]. Available: https://books.google.co.id/books?hl=en&lr=&id=BEq8EAAAQBAJ&oi=fnd&pg=PT13&dq=D.+Foster,+Generative+Deep+Learning,+vol.+6,+no.+November.+2019. .

[22] X. Wu, K. Wang, X. Wang, H. Kan, and J. Kurths, “Color image DNA encryption using NCA map-based CML and one-time keys,” Signal Process. 22, vol. 148, pp. 272–287, 2018, doi: 10.1016/j.sigpro.2018.02.028.

[23] X. Li, S. Zhou, and L. Zou, “Design of DNA Storage Coding with Enhanced Constraints,” Entropy, vol. 24, no. 8, p. 1151, Aug. 2022, doi: 10.3390/e24081151.

[24] M. Dimopoulou and M. Antonini, “Data and image storage on synthetic DNA: existing solutions and challenges,” EURASIP J. Image Video Process., vol. 2022, no. 1, p. 23, Oct. 2022, doi: 10.1186/s13640-022-00600-x.

[25] D. Na, “DNA steganography: Hiding undetectable secret messages within the single nucleotide polymorphisms of a genome and detecting mutation-induced errors,” Microb. Cell Fact., vol. 19, no. 1, pp. 1–9, 2020, doi: 10.1186/s12934-020-01387-0.

[26] L. Piantanida and william l. Hughes, “A PCR-free approach to random access in Dna,” Nat. Mater., vol. 20, no. 9, p. 1172, 2021, doi: 10.1038/s41563-021-01090-4.

[27] A. Brock, J. Donahue, and K. Simonyan, “Large scale GaN training for high fidelity natural image synthesis,” in 7th International Conference on Learning Representations, ICLR 2019, 2019, pp. 1–35, [Online]. Available: https://arxiv.org/abs/1809.11096.

[28] T. Karras, S. Laine, and T. Aila, “A Style-Based Generator Architecture for Generative Adversarial Networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 12, pp. 4217–4228, 2021, doi: 10.1109/TPAMI.2020.2970919.

[29] I. Goodfellow et al., “Generative adversarial networks,” Commun. ACM, vol. 63, no. 11, pp. 139–144, 2020, doi: 10.1145/3422622.

[30] Z. Wang and Q. Wu, “An Integrated Deep Generative Model for Text Classification and Generation,” Math. Probl. Eng., vol. 2018, pp. 1–8, Aug. 2018, doi: 10.1155/2018/7529286.

[31] Y. Zhao, X. Xia, and R. Togneri, “Applications of Deep Learning to Audio Generation,” IEEE Circuits Syst. Mag., vol. 19, no. 4, pp. 19–38, 2019, doi: 10.1109/MCAS.2019.2945210.

[32] S. Sinha, S. Ebrahimi, and T. Darrell, “Variational adversarial active learning,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2019-Octob, pp. 5971–5980, 2019, doi: 10.1109/ICCV.2019.00607.

[33] Y. Liu, N. Qiao, and Y. Altinel, “Reinforcement Learning in Neurocritical and Neurosurgical Care: Principles and Possible Applications,” Comput. Math. Methods Med., vol. 2021, pp. 1–6, 2021, doi: 10.1155/2021/6657119.

[34] M. Simonovsky and N. Komodakis, “GraphVAE: Towards generation of small graphs using variational autoencoders,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11139 LNCS, pp. 412–422, 2018, doi: 10.1007/978-3-030-01418-6_41.

[35] M. Ilse, J. M. Tomczak, C. Louizos, and M. Welling, “DIVA: Domain invariant variational autoencoder,” in Proceedings of the Third Conference on Medical Imaging with Deep Learning, 2019, no. 2014, pp. 1–18, [Online]. Available: http://proceedings.mlr.press/v121/ilse20a.html.

[36] S. Huang, A. Makhzani, Y. Cao, and R. Grosse, “Evaluating lossy compression rates of deep generative models,” in 37th International Conference on Machine Learning, ICML 2020, 2020, vol. 119, pp. 4394–4404, [Online]. Available: https://proceedings.mlr.press/v119/huang20c.html.

[37] M. Shyu, S. Chen, and S. S. Iyengar, “A Survey on Deep Learning Techniques,” Strad Res., vol. 7, no. 8, Aug, pp. 1- 6 2020, doi: 10.37896/sr7.8/037.

[38] K. Raza and N. K. Singh, “A Tour of Unsupervised Deep Learning for Medical Image Analysis,” Curr. Med. Imaging Rev., vol. 17, no. 9, pp. 1059–1077, 2021, doi: 10.2174/18756603mtezonzmk0.

[39] A. Rezvani, M. Bigverdi, and M. H. Rohban, “Image-based cell profiling enhancement via data cleaning methods,” PLoS One, vol. 17, no. 5 May, pp. 1–19, 2022, doi: 10.1371/journal.pone.0267280.

[40] A. I. Paganelli et al., “Real-time data analysis in health monitoring systems: A comprehensive systematic literature review,” J. Biomed. Inform., vol. 127, no. September 2021, p. 104009, Mar. 2022, doi: 10.1016/j.jbi.2022.104009.

[41] Z. Kang, C. Catal, and B. Tekinerdogan, “Machine learning applications in production lines: A systematic literature review,” Comput. Ind. Eng., vol. 149, no. April, p. 106773, 2020, doi: 10.1016/j.cie.2020.106773.

[42] P. M. Stanley, L. M. Strittmatter, A. M. Vickers, and K. C. K. Lee, “Decoding DNA data storage for investment,” Biotechnol. Adv., vol. 45, no. September, p. 107639, 2020, doi: 10.1016/j.biotechadv.2020.107639.

[43] B. Cao, X. Ii, X. Zhang, B. Wang, Q. Zhang, and X. Wei, “Designing Uncorrelated Address Constrain for DNA Storage by DMVO Algorithm,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 19, no. 2, pp. 866–877, Mar. 2022, doi: 10.1109/TCBB.2020.3011582.

[44] C. Pan, S. M. Hossein Tabatabaei Yazdi, S. Kasra Tabatabaei, A. G. Hernandez, C. Schroeder, and O. Milenkovic, “Image Processing in DNA,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2020-May, pp. 8831–8835, 2020, doi: 10.1109/ICASSP40776.2020.9054262.

[45] L.-F. Song, Z.-H. Deng, Z.-Y. Gong, L.-L. Li, and B.-Z. Li, “Large-Scale de novo Oligonucleotide Synthesis for Whole-Genome Synthesis and Data Storage: Challenges and Opportunities,” Front. Bioeng. Biotechnol., vol. 9, no. June, p. 13, Jun. 2021, doi: 10.3389/fbioe.2021.689797.

[46] S. Zhang, B. Huang, X. Song, T. Zhang, H. Wang, and Y. Liu, “A high storage density strategy for digital information based on synthetic DNA,” 3 Biotech, vol. 9, no. 9, p. 342, Sep. 2019, doi: 10.1007/s13205-019-1868-4.

[47] P. Mishra, C. Bhaya, A. K. Pal, and A. K. Singh, “Compressed DNA Coding Using Minimum Variance Huffman Tree,” IEEE Commun. Lett., vol. 24, no. 8, pp. 1602–1606, 2020, doi: 10.1109/LCOMM.2020.2991461.

[48] A. Rasool, Q. Qu, Y. Wang, and Q. Jiang, “Bio-Constrained Codes with Neural Network for Density-Based DNA Data Storage,” Mathematics, vol. 10, no. 5, p. 845, Mar. 2022, doi: 10.3390/math10050845.

[49] J. Zrimec et al., “Controlling gene expression with deep generative design of regulatory DNA,” Nat. Commun., vol. 13, no. 1, p. 5099, Aug. 2022, doi: 10.1038/s41467-022-32818-8.

[50] G. M. Harshvardhan, M. K. Gourisaria, M. Pandey, and S. S. Rautaray, “A comprehensive survey and analysis of generative models in machine learning,” Comput. Sci. Rev., vol. 38, p. 100285, 2020, doi: 10.1016/j.cosrev.2020.100285.

[51] A. Testolin, M. Piccolini, and S. Suweis, “Deep learning systems as complex networks,” J. Complex Networks, vol. 8, no. 1, pp. 1–21, 2020, doi: 10.1093/comnet/cnz018.

[52] C.-Y. Zhang, Q. Zhao, C. L. P. Chen, and W. Liu, “Deep compression of probabilistic graphical networks,” Pattern Recognit., vol. 96, p. 106979, Dec. 2019, doi: 10.1016/j.patcog.2019.106979.

[53] K. Chen, H. Zhou, H. Zhao, D. Chen, W. Zhang, and N. Yu, “Distribution-Preserving Steganography Based on Text-to-Speech Generative Models,” IEEE Trans. Dependable Secur. Comput., vol. 19, no. 5, pp. 3343–3356, Sep. 2022, doi: 10.1109/TDSC.2021.3095072.

[54] X. Duan, J. Liu, and E. Zhang, “Efficient image encryption and compression based on a VAE generative model,” Journal of Real-Time Image Processing, vol. 16, no. 3. pp. 765–773, 2019, doi: 10.1007/s11554-018-0826-4.

[55] X. Liu et al., “Medical Image Compression Based on Variational Autoencoder,” Math. Probl. Eng., vol. 2022, pp. 1–12, Dec. 2022, doi: 10.1155/2022/7088137.

[56] C. Huang, Y. Chai, Z. Zhu, B. Liu, and Q. Tang, “A Novel Distributed Fault Detection Approach Based on the Variational Autoencoder Model,” ACS Omega, vol. 7, no. 3, pp. 2996–3006, 2022, doi: 10.1021/acsomega.1c06033.

[57] R. Danhaive and C. T. Mueller, “Design subspace learning: Structural design space exploration using performance-conditioned generative modeling,” Autom. Constr., vol. 127, p. 103664, Jul. 2021, doi: 10.1016/j.autcon.2021.103664.

[58] Y. Skandarani, P.-M. Jodoin, and A. Lalande, “GANs for Medical Image Synthesis: An Empirical Study,” J. Imaging, vol. 9, no. 3, p. 69, 2023, doi: 10.3390/jimaging9030069.

[59] F. Blom, “Unsupervised Feature Extraction of Clothing Using Deep Convolutional Variational Autoencoders,” p. 83, 2018. [Online]. Available: https://www.diva-portal.org/smash/record.jsf?pid=diva2%3A1230233&dswid=-6899.




Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571  (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
   andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0