AI for predicting disease
Predicting phenotypes such as the expression of the disease (e.g. a cancer type), from biomarkers such as the genome including all the mechanisms, pathways, and interactions that also include personal histories and demographics, is beyond the human capability to interpret.
To make genomic medicine a reality, machine learning algorithms need to interpret the genome of the cell and its relation to disease, linking the effects of genetic variations and potential treatment to be explored in a quicker cheaper and more accurate way than can be otherwise obtained via laboratory experiments.
The enormous complexity of the relationship between a full genotype and its phenotype can only be understood using machine learning and Deep Learning and will play a critical role as biology moves toward high-throughput experiments.
Implementing comparative Genomics
Genetic variants with links to disease risks by association is enhanced by using comparative genomics. These methods, by comparing large genome and phenotype data sets on both healthy individuals and cancer patients,
Deep Learning algorithms create models on the genotype-to-phenotype relationship and cell variables in order to produce a disease risk model.
The resulting comparative model establishes statistical significance for a potentially causal variant for a particular disease between the affected group of individuals compared to a control group of non-affected individuals.
Deciphering genomics via Big Data on the Blockchain
One of the main difficulties with genome wide association studies is establishing statistical significance for predicting risk, as that they output correlations, not causal relationships.
Some of the reasons for not finding causal variants is due to undetected differences in subpopulations groups and factors such as location, demographics, ancestry, health data, lifestyle and more. The statistical problem is made worse by the fact that some variants have weak effects and those that have strong effects are rare and represent a low of the % of the population group.
The obvious solution is for enlarging the breadth and depth of the data sets available for analysis, as Deep Learning loves consuming data. Instead of studying data on several thousand patients, expanding the data universe to hundreds of thousands, exposes enough data for Deep Learning to find causal variants. Additionally, enriching the data with personal profile data, health data from wearables, lifestyle habits and demographics makes prediction even more accurate.
Deciphering the genomic instructions of the cell and the impact of biological mechanisms requires an exponential growth in data, something that the Blockchain is ideally suited to achieve.
Developing a genomics AI community
AI developer communities on speech recognition, natural language processing and object recognition have propelled AI into our daily lives. In a similar way, communities on genomics and computational biology could develop and play a key role in discovering pathways to disease if a shared infrastructure and tools were provided.
Given access to data and applications that can be queried by research experts for the relationships between cause and effect, for example, if a genetic variation increase or decreases with another biomarker or vice versa. This iterative approach and way to interrogate the data by researchers allows to compare against machine learning models and drives data-driven interpretations.
To develop communities, a new venture, Block23 aims to provide researchers data access with querying tools to conduct ‘what-if’ assumptions. Results can be rendered on graphical interfaces (e.g. heat maps) using linear models, decision trees, and random forests that enable researchers to easily drill down into the data and visualize how attributes (variations) are organized in priority.
The end goal will be to improve machine learning accuracy by having expert researchers contribute by evolving interpretive models and testing assumptions in live trials.