A Survey of Applications of ML in Healthcare
by Ana Cismaru
The purpose of this blog post is to present an overview of applications of machine learning in healthcare focusing on medical imaging, wearables, and molecular biology. In future posts, I will dive deeper into examples of simple ML experiments and state of the art papers from each of these topics.
Introduction
There’s so much complex data constantly streaming from our bodies that there is no way it could all be processed manually.

Figure 1: An example of a DNA sequence, could you ever understand that?
Luckily machine learning is perfect for this type of problem. The crux of ML consists of processing large complex data sets and making meaning out of them (whether it be classification, predictions, generation, etc). By applying machine learning to healthcare, we can better analyze these large datasets that are our bodies.
The power of understanding our bodies’ data could lead to a paradigm shift for how we think about medicine. For one, we could shift from curative medicine practices to a more preventative model. Wouldn’t it be great if instead of being diagnosed with Alzeihmer’s at the age of 60, a doctor using a ML model could warn you at the age of 30 that you are prone to developing this disease? This would give enough time to adjust your lifestyle to reduce risk of contracting Alzeihmer’s.
More automation in the space could also mean shorter wait times for medical procedures and prioritization of patients according to their conditions. Rather than just sorting through X-rays in chronological order, a ML model could notify doctors of the most severe conditions and allow them to address those first. It would be inefficient to have a doctor spend an extra 10 minutes talking to a patient with no condition while ignoring who just got diagnosed with cancer. A startup called Nines is working precisely on creating machine learning powered tools like this to improve hospital workflows.
Machine learning is also extremely helpful in a medical research setting. By using AI solutions, lengthy and tedious research experiments could be converted to computational models. These models can run 24/7 and do not necessarily rely on an active researcher to conduct them. In turn, this allows researchers to develop more varied experiments more rapidly.
So clearly machine learning would have massive implications in the medical space. In the upcoming paragraphs, I’ll do a survey of some applications of ML in healthcare including medical imaging, activity trackers, and molecular cell biology. In future posts, I’ll zoom in on each category and present more detailed explanations.
Preface: ML is not a replacement for doctors

Figure 2: Healthcare workers are an essential part of our society <3
One important thing I want to highlight before diving in deeper: ML models are not a replacement for healthcare workers. They simply are tools to aid doctors and researchers with their daily work.
Even though there have been tremendous breakthroughs in the realm of ML for Healthcare, there are still many underlying issues related to ML in general. For one, almost all ML algorithms can succumb to overfitting, biased datasets, or lack of generalizability. This weakness is amplified even further in a high stakes domain like the healthcare industry which has far stricter standards than typical applications.
As a result, conclusions about patients or experiments should never be made without human expert approval. If you’re interested in reading more about the ethical challenges of implementing ML for healthcare, check out this article.
Medical Imaging
Medical imaging scans such as X-rays, MRIs, etc, are commonly used to visualize human organs. With the advent of computer vision, researchers have started applying machine learning techniques, such as image classification, object detection, and segmentation, to these scans. At their best, these reliable and automatic methods for medical imaging evaluation can provide objective quantification and allow for automatic ranking of the severity of cases.

Figure 3: An example of object detection + classification (left) and segmentation (right) from X-rays
Some successful applications of computer vision techniques for medical imaging include detecting skin cancer, diagnosing Parkinson’s disease, and translating one type of scan to another (ie: going from a CT scan to a PET scan). Other notable examples include the segmentation model U-Net, which has become the standard baseline for any medical segmentation problem in the field, and the retinal disease progression model from Google’s Deepmind.
Finally, I’d love to highlight the research I am working on at Henry Lab of UCSF. The goal of my project is to create an automatic spinal cord gray matter segmentation pipeline to track the progression of patients with Multiple Sclerosis. By segmenting the gray matter in the spinal cord, doctors can clearly see the defined outline of the gray matter and determine if it has changed shape over time.

Figure 4: A slice of a spinal cord MRI, and its resulting segmentation of the gray matter
Of course, not every idea works out in the real world, which demonstrates the challenge of applying ML to a clinical setting. In 2018, Google announced that they had created an ML model that can diagnose diabetes from images of retinas. The results were fantastic on paper, but when put into practice they failed pretty miserably. The researchers did not realize that, in real-world settings, retina images don’t always have perfect lighting (more than a fifth of images were rejected due to poor lighting) and processing data requires a strong internet connection (something that not all the test clinics had). In the large scheme of things, this example illustrates how creating ML solutions for healthcare involves much more than just algorithm design.

Figure 5: the difference between a normal retina and a retina from a diabetic person
If medical imaging seems interesting to you, keep an eye out for my medical imaging blog post. In the meantime, here are some useful resources to check out if you want to dive further in the world of medical imaging:
Familiarize yourself with the different modalities of medical imaging (MRI, X-Ray, CT, PET…), their unique characteristics and respective python libraries (OpenCV, Nibabel, pyDicom…)
Learn about standard Computer Vision models for image classification (here’s a good overview) - a lot of medical imaging models use standard CV models as their foundation.
Read the U-Net paper if you want to get into image segmentation
Wearables
You’ve probably noticed that Apple Watches, FitBits, and other wearables are becoming more and more prevalent. A lot of people love tracking their activity data, but these wearables can be made even more useful with the incorporation of machine learning. One of the features of activity trackers is that they are constantly collecting data about your body (an Apple Watch collects your heart rate every 5 seconds when exercising). ML loves this. As they learn about your unique body characteristics, wearables will become personalized to you and could learn to warn you if your health metrics seem off.

Figure 6: Wearable data predicting Lyme disease; Snyder was able to receive treatment before even testing positive
One notable example of the power of wearables is how a Stanford Professor (Dr. Snyder) was able to detect that he had contracted Lyme disease during a vacation thanks to his biosensors. Consequently, he was able to receive treatment before his symptoms became severe. With the power of ML + constant data monitoring, there could truly be a shift from curative to preventative healthcare practices.
An interesting question that arises from emerging developments in wearable technology is how this data should be protected in regards to privacy. For one, the data that wearables collect doesn’t fall under the Health Insurance Portability and Accountability Act (HIPAA). That means that wearable companies could sell your health-data to third parties. One example of wearable data privacy gone wrong was an incident from 2011 where FitBit forgot to automatically make their new users’ sexual activity data private. This resulted in people being able to search for other’s sexual activity - a super big yikes if youknowwhatimsayin.
But anyways, back to the fun ML content.
Generally, the machine learning techniques used to analyze wearable data usually don’t have “labels” or ground truth answers. This is partly due to the fact that health metrics differ from person-to-person on a day-to-day basis, and partly to the abundance of unlabeled wearable and the scarcity of labeled medical conditions. Thus, due to the nature of the data, ML researchers creating models often resort to unsupervised (no labels) or more commonly semi-supervised (some labels) learning techniques.

Figure 7: Example of multivariate time series wearable data
DeepHeart, a paper from UCSF, demonstrates the effectiveness of using self-supervised learning and LSTMs for predicting cardiovascular risk. Using data from Apple Watches (heart rate, step count, etc), the model reliably is able to detect diabetes, sleep apnea, hypertension, and high cholesterol - all of which are common diseases which are often not diagnosed. Thanks to this and similar technology, hopefully more people with underlying conditions will be able to get diagnosed and treated.
To learn more about machine learning for wearables, check these out:
Familiarize yourself with manipulating multivariate time series - the type of data most wearables produce
Learn more about semi-supervised learning techniques
If you have a Fitbit, try this tutorial to manipulate your own health data
If you have an iPhone or Apple Watch, learn how to extract your health data with HealthKit
Molecular Cell Biology
Our genomes, proteomes, cells, etc. make up who we are at our core, and yet they are just incomprehensible to us without the use of technology. Despite these molecular features being so key to the understanding of our bodies, there are still many things we do not understand about them. Both molecular and computational biologists have made tremendous progress in improving our understanding of molecular cell biology. One notable example of this is next-generation sequencing (NGS) technology, which has allowed researchers to sequence (ie: read) the entire human genome in less than a day. Yet on top of these innovations, certain experimental assays could be elevated through the use of machine learning.
We have seen the benefits of applying computational techniques to biology as demonstrated through the development of DeepVariant, Alphafold 2, or COVID-19 treatments.
DeepVariant is a model developed by Google and Verily Life Sciences that reduces the number of errors in genome sequencing from error-prone next-generation sequencing machines. This is super relevant to most applications of ML in biology, as even the smallest mistake in a genetic sequence could have a detrimental health impact.
Google/Deepmind’s most recent contribution to the field of computational biology is Alphafold 2. Their model succeeded in predicting protein folding with extreme accuracy. This in turn has massive implications for the field of drug discovery and the need for tedious and expensive scientific experiments. Rather than using cryo-electron microscopy, researchers could use the Alphafold model to determine the structure of proteins, reducing experimentation time significantly.

Figure 8: How a protein is folded - a pretty complicated process, right?
Along the same vein, researchers at MIT are also trying to reduce COVID-19 drug development time by finding out how to repurpose already-existing drugs rather than creating new ones. The motivation behind this is that the latter takes significantly more time. What’s cool about this paper is that their model accounts both for changes in gene expression caused by the disease and for changes caused by aging. This is critical for diseases, such as COVID-19, which affect the elderly differently than younger people. With this knowledge, the researchers were able to identify the genes that were impacted by both COVID-19 and aging. They then searched for existing drugs that targeted those genes using an autoencoder.
Despite these awesome innovations, applying ML to molecular biology does face some challenges. For one, the curse of dimensionality is very prevalent in this field. Genome sequences are over 3 billion base pairs long and are therefore high dimensional. Therefore, to create robust ML systems, the model must either be fed very large amounts of data or the data’s dimensions must be reduced. Unfortunately, it is difficult to collect large amounts of genomic data (even using next-generation sequencing methods) and researchers can’t necessarily reduce dimensions of the data without losing its integrity (since there are so many factors at play: DNA, RNA, and other functional parts of the genome).
As always, if this sounds interesting to you, take a look at these resources:
Familiarize yourself with NGS data formats and their significance
Check out these datasets to find examples to experiment with
Read into sequencing assays such as DNA-Seq, RNA-Seq, GWAS, SNP, HI-C, etc
Strengthen your general knowledge of molecular biology
To Recap
Our bodies are constantly producing infinite and primarily incomprehensible amounts of data.
Machine learning is great for understanding loads of data, so applying it to healthcare presents a massive opportunity.
ML is not a replacement of healthcare workers, but rather a tool to help them do their jobs even better.
Sometimes models in the lab do not perform the same in the real world.
Computer Vision techniques can be applied to medical imaging to facilitate the analysis of various scans.
Wearables are constantly collecting an abundance of data which makes them suitable for ML solutions.
Molecular biology data is generally very big and creating ML models for it may reduce experimentation time in the lab.
The impact of a successful application of ML in healthcare is high: it will speed up processes, increase objectiveness, allow for more preventative practices, etc…
If this article got you excited about ML for healthcare, please do check out all the resources linked above and keep an eye out for my future posts :)


