Brave New Diagnostics: The Convergence of Genomic Sequencing and Machine Learning

Photo courtesy

Photo courtesy

Author: Senija Selimovic-Hamza Edited by: Jun Hon Pang

In 1932 the world held its breath for a second when Aldous Huxley published his dystopian view of the future after year 2540 in the novel Brave New World. This future was faced with gene-engineering, novel developments in reproductive technology and the manipulation of minds (1). Although most schools tend to teach that James Watson and Francis Crick discovered the DNA double helix twenty years after Huxley’s prophecy, the science around the molecule of life dates back to the beginning of the 19th century when Swiss chemist Johann Friedrich Miescher started experiments to find out the composition of leukocytes (2). Ever since, the fascination around our genetic material has grown and resulted not only in bewitching pieces of literature, but also in novel technologies for disease diagnostics and prevention. Today we are capable of detecting known and unknown pathogens, as well as diagnosing inherited diseases. Predicting future diseases for individuals has become increasingly feasible, reliable and cheaper.  Even in popular culture, new systems for disease prediction and prevention are being promoted by celebrities such as Angelina Jolie, who underwent a double mastectomy after discovering a gene mutation that increased her risk of breast cancer (3). It is clear that as technology evolves, so too does medicine - artificial intelligence (AI) is booming,  the first robot has received a citizenship, chat-bots have been launched in healthcare and the list goes on. And all of these are only the beginning of a brave new world.

In the last 50 years we have witnessed a drastic move from sequencing short oligonucleotides to having millions of bases through whole genome sequencing. After understanding the value of such genomic data sets, leading healthcare and technology players have started working on similar goals- enabling faster, cheaper and more user-friendly sequencing machines. Currently, the term “next-generation sequencing” is now gradually being replaced by “third-generation sequencing”, but discussions are still ongoing regarding what these new sequencing machines will bring and what will define them. One prominent third-generation candidate are nanopore sequencers – a novelty that measures the change in current as molecules passes through nanoscale pores (4). The company Oxford Nanopore Technologies was one of the first offering nanopore platforms such as the GridION and MinION. Such advancement bears tremendous revolutionary potential because their sequencers have already outperformed many of their ancestral next-generation sequencers in size, price and user-friendliness. The MinION is highly portable due to its small size comparable to a smartphone, and all it needs is a connection to a standard laptop. Yet, it allows rapid field based genomic surveillance in case of virus outbreaks. Having a portable sequencer in cases such as Ebola or Zika can help break the chain of infection faster than ever (5). In addition, it enables the analysis of samples with multiple pathogens and even a so called “de novo assembly” of novel species in real time - a merge of short DNA sequences into a full genome of an unknown species. Although the inventors still struggle to enhance the sequence quality while maintaining the size and speed, this little machine is definitely a promising new invention. Another notable third-generation sequencer is the single molecule real time (SMRT) platform from Pacific Biosciences. In this platform, DNA analysis is performed on a chip with microfabricated nanostructures known as zero-mode waveguides (ZMWs). Besides its novelty, this platform can produce long reads up to 10kB in a time efficient manner, which is useful for de novo genome assemblies. The hopes and expectations are high that such sequencers will become more precise in the near future, and along with a decrease in prices for whole-genome sequencing, they will turn into standard equipment in research institutes and diagnostic laboratories.

Why do we need these developments at all, when diagnostic units worldwide are already able to detect a huge panel of diseases? While the industry has surprised us with the term “Industry 4.0” and launched tools and methods for process optimization, simulation and prediction of outcomes, diagnostics is now in the need of working on the same topics. The bitter truth is that it takes years and decades from the discovery of a new microbe by next-generation sequencing until the proof that the microbe is the cause of a disease. If such a pathogen is not necessarily something as monstrous as Ebola, there is a huge possibility that the causality will never be fully explored because the situation is not urgent enough to concentrate research efforts. In fact, many viruses will never fulfill Robert Koch’s famous postulates for proving causal relationships between microbes and diseases, simply because the postulates were initially set up for bacteria, but they are still considered a gold standard both for bacteria and viruses today. On the other hand, we still have (too) many unresolved diseases. Furthermore, many patients go through several diagnostic procedures before the cause for a known disease is found. It is a vicious cycle that unfortunately still lacks effectiveness. Hence, implementing new, faster and more reliable methods for pathogen detection definitely shortens the timespan between discovery and medication (6). Tools such as the MinION allow research to take place directly in the field and in healthcare units, hence building a bridge between the clinic and science. In addition, data gained from such analyses allows pharmaceutical companies to react swiftly to new problems.

Nonetheless, the advancement of current genomic sequencing technologies is not without its challenges. It takes a team of bioinformaticians to produce meaningful results from raw sequences. Start-ups and new companies are springing up like mushrooms, some of them providing home-sequencing kits for online genetic origin analysis, and some of them providing impactful solutions to complex problems like early cancer or stroke diagnostics for clinics. In particular, SOPHiA Genetics is probably one of the fastest growing companies in early cancer diagnostics based on genetic data, and its network of collaborating hospitals has been expanding. SOPHiA is utilizing thousands of genomic profiles from real patients worldwide to develop a cost-effective system of personalized diagnosis. It currently covers a wide range of medical areas, ranging from oncology, metabolism, paediatrics, cardiology to hereditary cancers, and it is very likely that this panel will expand soon. Clinicians can submit and analyze sequences using their software without needing a whole department of IT specialists, and the precision grows with the number of submitted genomes as SOPHiA is built based on data-driven analysis.  In other words, AI and data-driven deep-learning mechanisms seem to be the solution to early disease diagnostics using next-generation or third-generation sequencing without the extra-effort and cost of establishing bioinformatics units.  

The potential of AI in diagnostics is not only refined to genome analysis. In 2017 Babylon Health raised millions of dollars to build the first “AI doctor” in form of a smartphone chatbot. Other companies followed the trend and first trials have already started in London (7). As trivial as it may seem to chat with an AI doctor before consulting a real medical professional, such a machine learning platform can promise a brighter future for diagnostics and healthcare by reducing treatment costs by as much as 50%.

Diagnostic fields where AI may be relevant in the near future are radiology, pathology and dermatology, where the main diagnosis are derived from screening pictures – the perfect starting point for developing deep-learning algorithm. Such algorithms are fed with thousands and even millions of images to generate quantifiable data, which allows deep-learning procedures to improve detection of diagnostic outcomes. In 2017 the US National Cancer Institute provided lung scans to the Data Science Bowl and thousands of deep learning algorithms were developed under the theme of “turning machine intelligence against lung cancer”. The main aim was to greatly improve the precision of lesion screening in lungs, and to address the presently high false positive rate in lung cancer diagnostics (8). AI solutions for medical imaging, not only in CT scans but also in X-rays, already showed an improved outcome in lung cancer patients in Chinese hospitals. Major companies like IBM and Philips have started integrating predictive analytics software into their machines. A study from the University of Adelaide has shown that algorithms could even predict death with an uncertainty of only 5% (9). Such tools have also been tested in dermatology, and their preciseness in telling skin cancer and normal moles apart has been promising. The fear of new technologies replacing (human) professionals is legit, but it is up to us what we make out of all these inventions. We might finally have the possibility to combine human and artificial intelligence. Doctors might gain time while having AI support in repetitive works. They might take advantage of AI to double-check diagnosis. AI could serve as an immense support in teaching of young medical professionals. The sky is the limit.  

On the other hand, critics are becoming louder primarily due to the concerns of data safety, where delicate genetic or medical data may fall in the hands of the wrong people. We already lived through the time when normal human genes were patented by the industry and information was used to make money. Now we are facing fears of a similar scenario: Insurances could try to save money by declining customers who are at risk for developing fatal or chronic diseases. In reality, we might soon have our genome stored away as sort of a private key for a cryptocurrency, to ensure that insurances will still pay for our treatments without knowing our dirty little secrets. EncrypGen, for instance, is a genomic blockchain network that will protect your genomic data in the form of a DNA token, which can be bought and sold easily and safely, just like any other cryptocurrency. By storing your genome in form of a private key it allows you to control the access to your information, and it is definitely an eye-opener to what we should anticipate for in the next few years. 

Whether we are about to face all our darkest fears, or to see our brightest hopes become reality, it is up to us to be well prepared and to put '(data) safety first' before we embrace this brave new world of diagnostics. If we ever arrive to that future, where an AI doctor has sent us to a hospital… Just imagine how you sit there and wait for your sequencing results. A golden skull of a robot instead of a human skull replica suddenly looks at you from your human doctor’s desk. I personally would at least want to feel that we’re safe and that we did the right thing. What about you?