Implementation of hyyrö’s bit-vector algorithm using advanced vector extensions 2

(1) * Kyle Matthew Chan Chua Mail (Computer Technology Department, College of Computer Studies, De La Salle University, Philippines)
(2) Janz Aeinstein Fauni Villamayor Mail (Computer Technology Department, College of Computer Studies, De La Salle University, Philippines)
(3) Lorenzo Campos Bautista Mail (Computer Technology Department, College of Computer Studies, De La Salle University, Philippines)
(4) Roger Luis Uy Mail (Computer Technology Department, College of Computer Studies, De La Salle University, Philippines)
*corresponding author

Abstract


The Advanced Vector Extensions 2 (AVX2) instruction set architecture was introduced by Intel’s Haswell microarchitecture that features improved processing power, wider vector registers, and a rich instruction set. This study presents an implementation of the Hyyrö’s bit-vector algorithm for pairwise Deoxyribonucleic Acid (DNA) sequence alignment that takes advantage of Single-Instruction-Multiple-Data (SIMD) computing capabilities of AVX2 on modern processors. It investigated the effects of the length of the query and reference sequences to the I/O load time, computation time, and memory consumption. The result reveals that the experiment has achieved an I/O load time of ϴ(n), computation time of ϴ(n*⌈m/64⌉), and memory consumption of ϴ(n). The implementation computed more extended time complexity than the expected ϴ(n) due to instructional and architectural limitations. Nonetheless, it was par with other experiments, in terms of computation time complexity and memory consumption.

Keywords


DNA sequence alignment; Biometrics; Bit-vector algorithm; SIMD computing capabilities; Modern processors

   

DOI

https://doi.org/10.26555/ijain.v5i3.362
      

Article metrics

Abstract views : 344 | PDF views : 62

   

Cite

   

Full Text

Download

References


[1] S. P. Adey, “GPU Accelerated Pattern Matching Algorithm for DNA Sequences to Detect Cancer using CUDA Dissertation,” Coll. Eng. Pune, 2013, available at : Google Scholar.

[2] A. Mahram, “FPGA acceleration of sequence analysis tools in bioinformatics,” Bost. Univ. Coll. Eng., 2013, available at : Google Scholar.

[3] R. Bhukya and D. Somayajulu, “2-Jump DNA Search Multiple Pattern Matching Algorithm,” Int. J. Comput. Sci. Issues, vol. 8, no. 3, pp. 320–329, 2011, available at : Google Scholar.

[4] I. Murnaghan, “The Importance of DNA,” Explore DNA, 2019. [Online]. Available: http://www.exploredna.co.uk/the-importance-dna.html.

[5] X. Chang, F. A. Escobar, C. Valderrama, and V. Robert, “Exploring Sequence Alignment Algorithms on FPGA-based Heterogeneous Architectures.,” in IWBBIO, 2014, pp. 330–341, available at : Google Scholar.

[6] S. Memeti and S. Pllana, “Accelerating DNA Sequence Analysis Using Intel(R) Xeon Phi(TM),” in 2015 IEEE Trustcom/BigDataSE/ISPA, 2015, pp. 222–227, doi: 10.1109/Trustcom.2015.636.

[7] H. Hyyrö, “A bit-vector algorithm for computing Levenshtein and Damerau edit distances,” Nord. J. Comput., vol. 10, no. 1, pp. 29–39, 2003, available at : Google Scholar.

[8] G. Myers, “A fast bit-vector algorithm for approximate string matching based on dynamic programming,” J. ACM, vol. 46, no. 3, pp. 395–415, May 1999, doi: 10.1145/316542.316550.

[9] “Genome,” NCBI. [Online]. Available: https://www.ncbi.nlm.nih.gov/genome. [Accessed: 12-Feb-2019].

[10] L. Langner and M. S. D. Weese, “Parallelization of Myers fast bit-vector algorithm using GPGPU,” Diploma/Thesis, Freie Universität, Berlin, 2011, available at: Google Scholar .

[11] K. Benkrid, A. Akoglu, C. Ling, Y. Song, Y. Liu, and X. Tian, “High Performance Biological Pairwise Sequence Alignment: FPGA versus GPU versus Cell BE versus GPP,” Int. J. Reconfigurable Comput., vol. 2012, pp. 1–15, 2012, doi: 10.1155/2012/752910.

[12] W. Muła, N. Kurz, and D. Lemire, “Faster Population Counts Using AVX2 Instructions,” Comput. J., vol. 61, no. 1, pp. 111–120, Jan. 2018, doi: https://doi.org/10.1093/comjnl/bxx046.

[13] “Basics of Single Instruction Multiple Data (SIMD),” Code Project, 2011. [Online]. Available: https://www.codeproject.com/Articles/146414/Basics-of-Single-Instruction-Multiple-Data-SIMD. [Accessed: 12-Feb-2019].

[14] C. Lomont, “Introduction to intel advanced vector extensions,” Intel White Pap., pp. 1–21, 2011, available at : Google Scholar.

[15] S. A.Shehab, A. Keshk, and H. Mahgoub, “Fast Dynamic Algorithm for Sequence Alignment Based On Bioinformatics,” Int. J. Comput. Appl., vol. 37, no. 7, pp. 54–61, Jan. 2012, doi: 10.5120/4624-6636, available at : http://research.ijcaonline.org/volume37/number7/pxc3876636.pdf.

[16] P. Pandiselvam, T. Marimuthu, and R. Lawrance, “A Comparative Study on String Matching Algorithm of Biological Sequences,” CoRR, vol. abs/1401.7416, 2014, available at : http://arxiv.org/abs/1401.7416.

[17] J. Tarhio and E. Ukkonen, “Approximate Boyer–Moore String Matching,” SIAM J. Comput., vol. 22, no. 2, pp. 243–260, Apr. 1993, doi: 10.1137/0222018.

[18] M. Gou, “Algorithms for String matching.” July, 2014, available at: Google Scholar.

[19] S. M. Vidanagamachchi, S. D. Dewasurendra, R. G. Ragel, and M. Niranjan, “Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?,” Int. J. Res. Comput. Sci., vol. 2, no. 6, pp. 33–37, Nov. 2012, doi: 10.7815/ijorcs.26.2012.053.

[20] J. Zhu, J. S. Liu, and C. E. Lawrence, “Bayesian adaptive sequence alignment algorithms,” Bioinformatics, vol. 14, no. 1, pp. 25–39, Feb. 1998, doi: 10.1093/bioinformatics/14.1.25.

[21] B.-J. M. Webb, “BALSA: Bayesian algorithm for local sequence alignment,” Nucleic Acids Res., vol. 30, no. 5, pp. 1268–1277, Mar. 2002, doi: 10.1093/nar/30.5.1268.

[22] V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals,” in Soviet physics doklady, 1966, vol. 10, no. 8, pp. 707–710, available at : Google Scholar.

[23] Y. Nataliani and T. Wellem, “Implementation of Bit-Vector Algorithm for Approximate String Matching on Rhodopsin Protein Sequence,” Int. J. Comput. Appl., vol. 72, no. 14, pp. 34–38, Jun. 2013, doi: 10.5120/12565-9214.

[24] K. Fredriksson, “Row-wise Tiling for the Myers’ Bit-Parallel Approximate String Matching Algorithm,” 2003, pp. 66–79, doi: 10.1007/978-3-540-39984-1_6.

[25] S. Faro and M. O. Külekci, “Fast Packed String Matching for Short Patterns,” 2013, pp. 113–121, doi: 10.1137/1.9781611972931.10.

[26] E. F. D. O. Sandes, A. Boukerche, and A. C. M. A. De Melo, “Parallel Optimal Pairwise Biological Sequence Comparison,” ACM Comput. Surv., vol. 48, no. 4, pp. 1–36, Mar. 2016, doi: 10.1145/2893488.

[27] J. Hoffmann, D. Zeckzer, and M. Bogdan, “Using FPGAs to Accelerate Myers Bit-Vector Algorithm,” 2016, pp. 535–541, doi: 10.1007/978-3-319-32703-7_104.

[28] H. Hyyrö and G. Navarro, “Faster Bit-Parallel Approximate String Matching,” 2002, pp. 203–224, doi: 10.1007/3-540-45452-7_18.

[29] D. N. S. S. Liyanage, G. V. M. P. A. Fernando, D. D. M. M. Arachchi, R. D. D. T. Karunathilaka, and A. S. Perera, “Utilizing Intel Advanced Vector Extensions for Monte Carlo Simulation based Value at Risk Computation,” Procedia Comput. Sci., vol. 108, pp. 626–634, 2017, doi: 10.1016/j.procs.2017.05.156.

[30] “Intel® 64 and IA-32 Architectures Software Developer Manuals,” Intel Software Developer Zone, 2019. [Online]. Available: https://software.intel.com/en-us/articles/intel-sdm.

[31] P. Gepner, “Using AVX2 Instruction Set to Increase Performance of High Performance Computing Code,” Comput. Informatics, vol. 36, no. 5, pp. 1001–1018, 2017, doi: 10.4149/cai_2017_5_1001.

[32] D. Kusswurm, Modern X86 Assembly Language Programming, 2014, doi: 10.1007/978-1-4842-0064-3.

[33] “Intel® 64 and IA-32 Architectures Optimization Reference Manual,” Intel Corporation, 2019. [Online]. Available: https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimizatio
n-manual.pdf
. [Accessed: 12-Oct-2019].




Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571  (print) | 2548-3161 (online)
Organized by Informatics Department - Universitas Ahmad Dahlan ,  UTM Big Data Centre - Universiti Teknologi Malaysia, and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W : http://ijain.org
E : info@ijain.org, andri.pranolo@tif.uad.ac.id (paper handling issues)
     ijain@uad.ac.id, andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0