Aprendizaje de máquina y aprendizaje profundo en biotecnología: aplicaciones, impactos y desafíos

Edian F. Franco; Rommel J. Ramos

doi:10.22206/cac.2019.v2i2.pp7-26

Aprendizaje de máquina y aprendizaje profundo en biotecnología: aplicaciones, impactos y desafíos


Issue	Vol. 2 Núm. 2 (2019): Ciencia, Ambiente y Clima
DOI	10.22206/cac.2019.v2i2.pp7-26
Publicado	dic. 14, 2019
	Estadísticas

Universidad Federal de Para, Belém, Pará, Brasil.

edianfranco@ufpa.br

Universidad Federal de Para, Belém, Pará, Brasil.

rommelramos@ufpa.br

Resumen

La bioinformática es un área que ha modificado la forma en que se diseñan y se desarrollan los experimentos e investigaciones de las áreas biológicas. La biotecnología no ha quedado fuera de los alcances de la bioinformática, impactando directamente áreas como el descubrimiento y el desarrollo de fármacos, mejoramiento de cultivos, biorremediación, estudios de la diversidad ambiental, patología molecular, entre otras. Esto se debe, en gran medida, al desarrollo de las tecnologías de secuenciación de alto rendimiento o Next-generation sequencing (NGS), que han generado gran cantidad de datos que deben ser procesados y analizados para producir nuevos conocimientos y descubrimientos. Lo anterior ha promovido que dos áreas de la bioinformática y la ciencia de la computación, machine learning y deep learning, hayan sido utilizadas para el análisis de estos datos. El “aprendizaje de máquina” aplica técnicas que permiten que las computadoras aprendan, mientras que el “aprendizaje profundo” genera modelos de redes neuronales artificiales que intenta imitar el funcionamiento del cerebro humano, permitiéndoles aprender a partir de los datos y mejorar su aprendizaje a través de las experiencias. Estas dos áreas son esenciales para poder identificar, analizar, interpretar y obtener conocimientos de la gran cantidad de datos biológicos (Big biological data). En este trabajo hacemos una revisión de estas dos áreas: el aprendizaje de máquina y el aprendizaje profundo, orientado al impacto y sus aplicaciones en el área de biotecnología.

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., … Devin, M. (2016). Tensorflow: Large-scale machine learning on heterogeneous distributed systems. ArXiv Preprint ArXiv: 1603.04467.

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., … Zheng, X. (2016). TensorFlow : A System for Large-Scale Machine Learning This paper is included in the Proceedings of the TensorFlow : A system for large-scale machine learning.

Al-Ajlan, A., & El Allali, A. (2018). Feature selection for gene prediction in metagenomic fragments. BioData Mining, 11(1), 9. https://doi.org/10.1186/s13040-018-0170-z

Altae-Tran, H., Ramsundar, B., Pappu, A. S., & Pande, V. (2017). Low data drug discovery with one-shot learning. ACS Central Science, 3(4), 283–293.

Amara, J., Bouaziz, B., & Algergawy, A. (2017). A Deep Learning-based Approach for Banana Leaf Diseases Classification. In BTW (Workshops) (pp. 79–88).

Angermueller, C., Lee, H. J., Reik, W., & Stegle, O. (2017). DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biology, 18(1), 67.

Angermueller, C., Pärnamaa, T., Parts, L., & Stegle, O. (2016). Deep learning for computational biology. Molecular Systems Biology, 12(7), 878. https://doi.org/10.15252/msb.20156651

Bansal, A. K. (2005). Bioinformatics in microbial biotechnology - A mini review. Microbial Cell Factories, 4(ii), 1–11. https://doi.org/10.1186/1475-2859-4-19

Beckham, C., Hall, M., & Frank, E. (2016). WekaPyScript: Classification, Regression, and Filter Schemes for WEKA Implemented in Python. Journal of Open Research Software, 4. https://doi.org/10.5334/jors.108

Behjati, S., & Tarpey, P. S. (2013). What is next generation sequencing? Archives of Disease in Childhood: Education and Practice Edition, 98(6), 236–238. https://doi.org/10.1136/archdischild-2013-304340

Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., … Bengio, Y. (2010). Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for scientific computing conference (SciPy) (Vol. 4). Austin, TX.

Bermingham, M. L., Pong-Wong, R., Spiliopoulou, A., Hayward, C., Rudan, I., Campbell, H., … Haley, C. S. (2015). Application of high-dimensional feature selection: evaluation for genomic prediction in man. Scientific Reports, 5, 10312. https://doi.org/10.1038/srep10312

Berthold, M. R., Cebron, N., Dill, F., Di Fatta, G., Gabriel, T. R., Georg, F., … Wiswedel, B. (2006). KNIME: The konstanz information miner. 4th International Industrial Simulation Conference 2006, ISC 2006, 11(1), 58–61.

Bravo, À., Piñero, J., Queralt-Rosinach, N., Rautschka, M., & Furlong, L. I. (2015). Extraction of relations between genes and diseases from text and large-scale data analysis: Implications for translational research. BMC Bioinformatics, 16(1), 1–17. https://doi.org/10.1186/s12859-015-0472-9

Brechtmann, F., Mertes, C., Matusevičiūtė, A., Yepez, V. A., Avsec, Ž., Herzog, M., … Gagneur, J. (2018). OUTRIDER: A statistical method for detecting aberrantly expressed genes in RNA sequencing data. The American Journal of Human Genetics, 103(6), 907–917.

Budach, S., & Marsico, A. (2018). pysster: classification of biological sequences by learning sequence and structure motifs with convolutional neural networks. Bioinformatics, 34(17), 3035–3037.

Camacho, D. M., Collins, K. M., Powers, R. K., Costello, J. C., & Collins, J. J. (2018). NextGeneration Machine Learning for Biological Networks. Cell, 173(7), 1581–1592. https://doi.org/10.1016/j.cell.2018.05.015

Chen, S.-C., Tsai, T.-H., Chung, C.-H., & Li, W.-H. (2015). Dynamic association rules for gene expression data analysis. BMC Genomics, 16(1), 786. https://doi.org/10.1186/s12864-015-1970-x

Chipman, H. A., George, E. I., & McCulloch, R. E. (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1), 266–298.

Chollet, F. (2015). Keras. GitHub. Retrieved from https://github.com/fchollet/keras

Chung, C. L., Huang, K. J., Chen, S. Y., Lai, M. H., Chen, Y. C., & Kuo, Y. F. (2016). Detecting Bakanae disease in rice seedlings by machine vision. Computers and Electronics in Agriculture. https://doi.org/10.1016/j.compag.2016.01.008

Costello, J. C., Heiser, L. M., Georgii, E., Gönen, M., Menden, M. P., Wang, N. J., … Van Westen, G. J. P. (2014). A community effort to assess and improve drug sensitivity prediction algorithms. Nature Biotechnology, 32(12), 1202–1212. https://doi.org/10.1038/nbt.2877

Cuperlovic-Culf, M. (2018). Machine learning methods for analysis of metabolic data and metabolic pathway modeling. Metabolites, 8(1). https://doi.org/10.3390/metabo8010004

Datta, S. S., & Datta, S. S. (2006). Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes. BMC Bioinformatics, 7, 397. https://doi.org/10.1186/1471-2105-7-397

de Carvalho, L. M., Borelli, G., Camargo, A. P., de Assis, M. A., de Ferraz, S. M. F., Fiamenghi, M. B., … Carazzolle, M. F. (2019). Bioinformatics applied to biotechnology: A review towards bioenergy research. Biomass and Bioenergy, 123(March 2018), 195–224. https://doi.org/10.1016/j.biombioe.2019.02.016

Dixit, P., & Prajapati, G. I. (2015). Machine learning in bioinformatics: A novel approach for DNA sequencing. International Conference on Advanced Computing and Communication Technologies, ACCT, 2015-April, 41–47.

Dutil, F., Cohen, J. P., Weiss, M., Derevyanko, G., & Bengio, Y. (2018). Towards gene expression convolutions using gene interaction graphs. ArXiv Preprint ArXiv:1806.06975

Duvenaud, D. K., Maclaurin, D., Iparraguirre, J., Bombarell, R., Hirzel, T., Aspuru-Guzik, A., & Adams, R. P. (2015). Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems (pp. 2224–2232).

Eraslan, G., Avsec, Ž., Gagneur, J., & Theis, F. J. (2019). Deep learning: new computational modelling techniques for genomics. Nature Reviews Genetics, 20(7), 389–403. https://doi.org/10.1038/s41576-019-0122-6

Fiannaca, A., La Paglia, L., La Rosa, M., Renda, G., Rizzo, R., Gaglio, S., & Urso, A. (2018). Deep learning models for bacteria taxonomic classification of metagenomic data. BMC Bioinformatics, 19(7), 198.

Frank, E., Hall, M., Trigg, L., Holmes, G., & Witten, I. H. (2004). Data mining in bioinformatics using Weka. Bioinformatics, 20(15), 2479– 2481. https://doi.org/10.1093/bioinformatics/bth261

Free Software Foundation, I. (2016). GNU R. Retrieved from http://directory.fsf.org/wiki/R#tab=Overview

Gauthier, J., Vincent, A. T., Charette, S. J., & Derome, N. (2018). A brief history of bioinformatics. Briefings in Bioinformatics, (February), 1–16. https://doi.org/10.1093/bib/bby063

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672–2680).

Guimaraes, G. L., Sanchez-Lengeling, B., Outeiral, C., Farias, P. L. C., & AspuruGuzik, A. (2017). Objective-reinforced generative adversarial networks (ORGAN) for sequence generation models. ArXiv Preprint ArXiv:1705.10843

Gupta, A., & Zou, J. (2018). Feedback GAN (FBGAN) for DNA: A novel feedback-loop architecture for optimizing protein functions. ArXiv Preprint ArXiv:1804.01694.

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software. ACM SIGKDD Explorations, 11(1), 10–18. https://doi.org/10.%201145/1656274.1656278

Hornik, K., Buchta, C., & Zeileis, A. (2009). Opensource machine learning: R meets Weka. Computational Statistics, 24(2), 225–232.

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112). Springer.

Karatzoglou, A., Smola, A., Hornik, K., & Zeileis, A. (2004). kernlab-an S4 package for kernel methods in R. Journal of Statistical Software, 11(9), 1–20.

Kelley, D. R., Reshef, Y. A., Bileschi, M., Belanger, D., McLean, C. Y., & Snoek, J. (2018). Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Research, 28(5), 739–750.

Kuhn, M. (2008). Building predictive models in R using the caret package. Journal of Statistical Software, 28(5), 1–26.

Kumar, A., & Chrodia, N. (2016). Role of Bioinformatics in Biotechnology. Research and Review in BioSciences, 12(1), 293–317. https://doi.org/10.4018/978-1-5225-0610-2.ch011

Lavecchia, A. (2015). Machine-learning approaches in drug discovery: Methods and applications. Drug Discovery Today, 20(3), 318–331. https://doi.org/10.1016/j.drudis.2014.10.012

LeCun, Y., Bengio, Y., Hinton, G., Y., L., Y., B., & G., H. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539

Liakos, K. G., Busato, P., Moshou, D., Pearson, S., & Bochtis, D. (2018). Machine learning in agriculture: A review. Sensors (Switzerland), 18(8), 1–29. https://doi.org/10.3390/s18082674

Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18–22

Libbrecht, M. W., & Noble, W. S. (2015). Machine learning applications in genetics and genomics. Nat Rev Genet, 16(6), 321–332. https://www.doi.org/%2010.1038/nrg3920

Libbrecht, M. W., & Noble, W. S. (2017). Machine learning in genetics and genomics. Nature Reviews Genetics, 16(6), 321–332. https://doi.org/10.1038/nrg3920.Machine

Mamoshina, P., Vieira, A., Putin, E., & Zhavoronkov, A. (2016a). Applications of Deep Learning in Biomedicine. Molecular Pharmaceutics, 13(5), 1445–1454. https://doi.org/10.1021/acs.molpharmaceut.5b00982

Mamoshina, P., Vieira, A., Putin, E., & Zhavoronkov, A. (2016b). Applications of Deep Learning in Biomedicine. Molecular Pharmaceutics, 13(5), 1445–1454. https://doi.org/10.1021/acs.molpharmaceut.5b00982

Mamoshina, P., Volosnikova, M., Ozerov, I. V., Putin, E., Skibina, E., Cortese, F., & Zhavoronkov, A. (2018). Machine learning on human muscle transcriptomic data for biomarker discovery and tissue-specific drug target identification. Frontiers in Genetics, 9(JUL), 1–10. https://doi.org/10.3389/fgene.2018.00242

Martinez, R., Pasquier, N., & Pasquier, C. (2008). GenMiner: Mining non-redundant association rules from integrated gene expression data and annotations. Bioinformatics, 24(22), 2643– 2644. https://doi.org/10.1093/bioinformatics/btn490

Mccombie, W. R., Mcpherson, J. D., & Mardis, E. R. (2019). Next-Generation Sequencing Technologies. https://doi.org/10.1101/cshperspect.a036798

Metzker, M. L. (2010). Sequencing technologies — the next generation. Nature Reviews Genetics, 11(1), 31–46. https://doi.org/10.1038/nrg2626

Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., & Leisch, F. (2017). e1071: misc functions of the department of statistics, probability theory group (formerly: E1071), TU Wien. R package version 1.6-8.

Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., & Euler, T. (2006). YALE: Rapid prototyping for complex data mining tasks. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, 935–940.

Min, S., Lee, B., & Yoon, S. (2017). Deep learning in bioinformatics. Briefings in Bioinformatics, 18(5), 851–869. https://doi.org/10.1093/bib/bbw068

Mohanty, S. P., Hughes, D. P., & Salathé, M. (2016). Using deep learning for image-based plant disease detection. Frontiers in Plant Science, 7, 1419

Morales, I. R., Cebrián, D. R., Fernandez-Blanco, E., & Sierra, A. P. (2016). Early warning in egg production curves from commercial hens: A SVM approach. Computers and Electronics in Agriculture, 121(03082), 169–179. https://doi.org/10.1016/j.compag.2015.12.009

Noble, W. S. (2006). What is a support vector machine? Nature Biotechnology, 24(12), 1565–1567.

Oyelade, J., Isewon, I., Oladipupo, F., Aromolaran, O., Uwoghiren, E., Ameh, F., … Adebiyi, E. (2016). Clustering Algorithms: Their Application to Gene Expression Data. Bioinformatics and Biology Insights, 10, BBI. S38316. https://doi.org/10.4137/BBI.S38316

Pan, X., Rijnbeek, P., Yan, J., & Shen, H.-B. (2018). Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks. BMC Genomics, 19(1), 511.

Park, S., Min, S., Choi, H., & Yoon, S. (2016). deepMiRGene: Deep neural network based precursor microrna prediction. ArXiv Preprint ArXiv:1605.00017

Park, Y., & Kellis, M. (2015). Deep learning for regulatory genomics. Nature Biotechnology, 33(8), 825–826. https://doi.org/10.1038/nbt.3313

Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., … Lerer, A. (2017). Automatic differentiation in pytorch

Patil, A. P., & Deka, P. C. (2016). An extreme learning machine approach for modeling evapotranspiration using extrinsic inputs. Computers and Electronics in Agriculture. https://doi.org/10.1016/j.compag.2016.01.016

Pedregosa, F., Michel, V., Grisel O., Blondel, M., Prettenhofer, P., Weiss, R., … Duchesnay E., Fré. (2011). Scikit-learn: Machine Learning in Python Gaël Varoquaux Bertrand Thirion Vincent Dubourg Alexandre Passos Pedregosa, Varoquaux, Gramfort et al. Matthieu Perrot. Journal of Machine Learning Research, 12, 2825–2830. Recuperado de http://scikit-learn.sourceforge.net

Peres-Neto, P. R., Jackson, D. A., & Somers, K. M. (2005). How many principal components? stopping rules for determining the number of non-trivial axes revisited. Computational Statistics and Data Analysis, 49(4), 974–997. https://doi.org/10.1016/j.csda.2004.06.015

Rhee, S., Seo, S., & Kim, S. (2017). Hybrid approach of relation network and localized graph convolutional filtering for breast cancer subtype classification. ArXiv Preprint ArXiv:1711.05859.

Ringnér, M. (2008). What is principal component analysis? Nature Biotechnology, 26(3), 303.

Rouillard, A. D., Hurle, M. R., & Agarwal, P. (2018). Systematic interrogation of diverse Omic data reveals interpretable, robust, and generalizable transcriptomic features of clinically successful therapeutic targets. PLoS Computational Biology, 14(5), 1–28. https://doi.org/10.1371/journal.pcbi.1006142

Scholz, M., Kaplan, F., Guy, C. L., Kopka, J., & Selbig, J. (2005). Non-linear PCA: a missing data approach. Bioinformatics, 21(20), 3887–3895.

Searls, D. B. (2010). The roots of bioinformatics. PLoS Computational Biology, 6(6), 1–7. https://doi.org/10.1371/journal.pcbi.1000809

Seide, F., & Agarwal, A. (2016). CNTK: Microsoft’s open-source deep-learning toolkit. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (p. 2135). ACM.

Sheehan, S., & Song, Y. S. (2016). Deep learning for population genetic inference. PLoS Computational Biology, 12(3), e1004845

SINGH, V., SINGH, A., CHAND, R., & KUSHWAHA, C. (2011). Role of Bioinformatics in Agriculture and Sustainable Development. International Journal of Bioinformatics Research, 3(2), 221–226. https://doi.org/10.9735/0975-3087.3.2.221-226

Song, X., Zhang, G., Liu, F., Li, D., Zhao, Y., & Yang, J. (2016). Modeling spatio-temporal distribution of soil moisture by deep learning-based cellular automata model. Journal of Arid Land, 8(5), 734–748.

Tan, J., Doing, G., Lewis, K. A., Price, C. E., Chen, K. M., Cady, K. C., … Greene, C. S. (2017). Unsupervised extraction of stable expression signatures from public compendia with an ensemble of neural networks. Cell Systems, 5(1), 63–71

Tan, J., Hammond, J. H., Hogan, D. A., & Greene, C. S. (2016). ADAGE-based integration of publicly available Pseudomonas aeruginosa gene expression data with denoising autoencoders illuminates microbe-host interactions. MSystems, 1(1), e00025-150

Thermes, C. (2014). Ten years of next-generation sequencing technology. Trends in Genetics : TIG, 30(9), 418–426. https://doi.org/10.1016/j.tig.2014.07.001

Tiwari, A., & Sekhar, A. K. T. (2007). Workflow based framework for life science informatics. Computational Biology and Chemistry. https://doi.org/10.1016/j.compbiolchem.2007.08.009

Van Gerven, M., & Bohte, S. (2017). Artificial neural networks as models of neural information processing. Frontiers in Computational Neuroscience, 11, 114.

Wainberg, M., Merico, D., Delong, A., & Frey, B. J. (2018). Deep learning in biomedicine. Nature Biotechnology, 36(9), 829–838. https://doi.org/10.1038/nbt.4233

Wang, M., Tai, C., E, W., & Wei, L. (2018). DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants. Nucleic Acids Research, 46(11), e69–e69

Werli, S. (2016). scikit-learn: Classification Algorithms on Iris Dataset - Brain Scribble. Retrieved September 21, 2019, from http://stephanie-w.github.io/brainscribble/classification-algorithms-on-iris-dataset.htm

Witten, I. H., Frank, E., & Hall, M. a. (2011a). Data Mining: Practical Machine Learning Tools and Techniques, Third Edition. Annals of Physics (Vol. 54). https://doi.org/10.1002/1521-3773 (20010316)40:6<9823::AID-ANIE9823 >3.3.CO;2-C

Witten, I. H., Frank, E., & Hall, M. A. (2011b). Data Mining Practical Machine Learning Tools and Techniques (3ra ed.). Burlington, MA: Morgan Kaufmann.

Yadav, B., Ch, S., Mathur, S., & Adamowski, J. (2016). Estimation of in-situ bioremediation system cost using a hybrid Extreme Learning Machine (ELM)-particle swarm optimization approach. Journal of Hydrology. https://doi.org/10.1016/j.jhydrol.2016.10.013

Zhou, J., Theesfeld, C. L., Yao, K., Chen, K. M., Wong, A. K., & Troyanskaya, O. G. (2018). Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nature Genetics, 50(8), 1171.

Zou, Z., Yang, L., Wang, D., Huang, Q., Mo, Y., & Xie, G. (2016). Gene Structures , Evolution and Transcriptional Profiling of the WRKY Gene Family in Castor Bean ( Ricinus communis L.), 1–23. https://doi.org/10.1371/journal.pone.0148243

aprendizaje de máquina aprendizaje pro-fundo biotecnología bioinformática datos biológicos

PDF
HTML

Resumen visto - 2363 veces
PDF descargado - 1167 veces
HTML descargado - 1376 veces

Descargas

Los datos de descarga aún no están disponibles.

Licencia

Esta obra está bajo una licencia internacional Creative Commons Atribución 4.0.

Copyright

Afiliaciones

Edian F. Franco
Universidad Federal de Para, Belém, Pará, Brasil.

Rommel J. Ramos
Universidad Federal de Para, Belém, Pará, Brasil.

Cómo citar

Aprendizaje de máquina y aprendizaje profundo en biotecnología: aplicaciones, impactos y desafíos. (2019). Ciencia, Ambiente Y Clima, 2(2), 7-26. https://doi.org/10.22206/cac.2019.v2i2.pp7-26

Content Top

Aprendizaje de máquina y aprendizaje profundo en biotecnología: aplicaciones, impactos y desafíos

Resumen

Descargas

Cómo citar

Enlaces recomendados

Biblioteca

Nuestras Revistas

Contacto

Aprendizaje de máquina y aprendizaje profundo en biotecnología: aplicaciones, impactos y desafíos

Resumen

Descargas

Licencia

Copyright

Afiliaciones

Cómo citar

Descargar cita

Enlaces recomendados

Biblioteca

Nuestras Revistas

Contacto