In what could easily turn out to as one of the most promising researches in recent times, scientists at the Linköping University have successfully established an AI-based algorithm that is able to process the human gene patterns and find disease-related genes. In a study published in Nature Communications, the AI algorithm is said to have been able to successfully distinguish between different genomes in a large pool of data. The researchers hope the findings can eventually be applied within precision medicine and individualized treatment.

AI has always been a part of human life in one way or the other, from games, software, to social networking, robotics, but scientists at the Linköping University have gone one further this time. The research involved creating entities called artificial neural networks that are trained by experimental data. The idea was to investigate whether it is possible to discover biological networks using deep learning.

An enormous database with information regarding patterns of 20,000 genes from a large pool of people was used and the AI model was then trained to find patterns of gene expression. The information to the algorithm was provided in an unsorted manner, in the sense that the artificial neural network was not provided any information about which gene expression patterns were from people with diseases, and which were from healthy people.

“We have for the first time used deep learning to find disease-related genes. This is a very powerful method in the analysis of huge amounts of biological information, or ‘big data’,” says Sanjiv Dwivedi, a postdoc in the Department of Physics, Chemistry, and Biology (IFM) at Linköping University.

One of the basic challenges with machine learning is that it is not exactly possible to see how the information is being processed by the AI. The only data that we see is the information that is put in and the results produced. When the scientists trained the artificial neural network, it was pondered upon whether it was possible to exactly understand how the algorithm works.

“When we analyzed our neural network, it turned out that the first hidden layer represented, to a large extent, interactions between various proteins. Deeper in the model, in contrast, on the third level, we found groups of different cell types. It’s extremely interesting that this type of biologically relevant grouping is automatically produced, given that our network has started from unclassified gene expression data,” says Mika Gustafsson, senior lecturer at IFM and leader of the study.

The scientists then investigated whether their model of gene expression could be used to distinguish between normal patterns and the ones associated with diseases. They confirmed that the model finds relevant patterns that conform to biological mechanisms in the body. Since the model has been trained using scrambled data, it is possible that the artificial neural network may have found totally new patterns as well. The researchers plan now to investigate whether such, previously unknown patterns, are relevant from a biological perspective.

“We believe that the key to progress in the field is to understand the neural network. This can teach us new things about biological contexts, such as diseases in which many factors interact. And we believe that our method gives models that are easier to generalize and that can be used for many different types of biological information,” says Mika Gustafsson.

The study has received financial support from the Swedish Foundation for Strategic Research (SSF) and the Swedish Research Council.

Journal Reference:

  1. Sanjiv K. Dwivedi, Andreas Tjärnberg, Jesper Tegnér, Mika Gustafsson. Deriving disease modules from the compressed transcriptional space embedded in a deep autoencoder. Nature Communications, 2020; 11 (1) DOI: 1038/s41467-020-14666-6