Salmon Run: BMI 702 Review Part IV -- Biomedical Imaging

Here is Part IV of my ongoing review of the Biomedical Artificial Intelligence (BMI 702) course, part of Harvard's Foundation of Biomedical Informatics 2023 Spring session, taught by Prof Marinka Zitnik and her team. If you want to check out my previous reviews in this series, they are listed below.

This review covers Module 5 of the course (weeks 10 and 11) and is devoted to the use of Computer Vision techniques to address Biomedical Imaging use cases. There are 9 papers and 2 book chapters, 6 in the first week and 5 in the second. I have some interest in Computer Vision models, having built an Image Classifier by fine-tuning a ResNet pre-trained on ImageNet to predict the type of medical image (radiography, pathology, etc) in medical text, and more recently, fine-tuning an OpenAI CLIP model on medical image and caption pairs to provide text-to-image and image-to-image search capabilities. However, all of these papers have a distinctly medical flavor, i.e. these directly address the needs of doctors, radiologists and pathologists in their day to day work, using data that is typically only found in hospital settings. While a large number of these papers deal with supervised learning, some use semi-supervised or weakly-supervised strategies, which require some adaptation of already available data, which in turn would require you to know about existence of said data to come up with the idea. But I thought they were very interesting in a "broaden my horizons" kind of way.

Module 5 Week 1

Dermatologist-level classification of skin cancer with deep neural networks (Esteva et al, 2017)

This is one of many landmark events where a neural network achieves superhuman performance at a particular task – in this case, classifying a variety of skin cancers from smart phone photos of lesions. It is also covered in the What-Why-How video for this week. The paper itself is paywalled, and Google Scholar only finds presentation slides by the primary author for a GPU Tech 2017 conference. The paper describes an experiment where a GoogleNet Inception V3 CNN, pre-trained on ImageNet data, was further fine-tuned on 129,450 clinical images of skin lesions spanning 2,032 different diseases. The diseases were further classified into a hierarchy via a taxonomy. Classifiers were constructed to predict one of 3 disease classes (first level nodes of the taxonomy – benign, malignant and non-neoplastic) and one of 9 disease classes (second level nodes), and their outputs compared to that of a human expert on a sample of the dataset. In both cases, the trained classifier out-performed the humans. Later experiments with larger number of disease classes and biopsy-proven labels, performed even better, the AUC for the sensitivity-specificity curve was 0.96. The performance of the CNN to predict Melanoma (with photos and dermascopy) and Carcinoma was then compared with predictions of 21 board certified dermatologists and was found to beat their performance on average. Finally, to test the classifier encodings, the last hidden layer of the CNN was reduced to two dimensions using T-SNE and found to cluster well across four disease categories, as well as for individual diseases within each category. In addition to the good results obtained, the paper is important in that it demonstrates an approach to detect skin cancer cheaply and effectively compared to previous approaches (dermascopy and biopsy), thereby saving many people from death and suffering.

Toward robust mammography based models for breast cancer risk (Yala et al, 2021)

This paper describes the Mirai model to predict the risk of breast cancer at multiple timepoints (1-5 years), using mammogram images (4 standard perspectives) and optionally, additional non-image risk factors such as age and hormonal factors. If the additional risk factors are not provided, Mirai predicts them from the aggregated vector representation of the mammograms. The risk factors (predicted or actual) along with the mammogram vector to predict the risk of breast cancer. Mirai used data collected by Massachusetts General Hospital (MGH), representing approximately 26k exams, splitting it 80/10/10 for training, validation and testing. The resulting model was tested against established risk models such as Tyrer-Cuzik v8 (TCv8) and other SOTA image based neural models with and without additional risk factors. The latter models were also trained on the MGH data. Mirai was found to outperform them using the C-index (a measure of concordance between label and prediction) and AUC at 1-5 year intervals as evaluation metrics. The model was then evaluated against 19k and 13k exams from the Karolinska Institute (Sweden) and CGMH (Taiwan) respectively and had comparable performance on both. It was also tested on ethnic subgroups and was found to compare equally well across all groups. It also outperformed the industry standard risk models at identifying high risk cohorts. The paper concludes by saying that Mirai could be used to provide more sensitive screening and achieve earlier detection for patients who will develop breast cancer, while reducing unnecessary screening and over-treatment for the rest.

Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning (Tiu et al, 2022)

This paper describes training a multi-modal CLIP model CheXzero, that learns an embedding using 377k chest X-rays and their corresponding raw radiology report from the MIMIC-CXR dataset, which is then used to predict pathologies (indications of different diseases) of the lung for unseen chest X-rays. This is done by generating positive and negative prompts for each pathology of interest. The model uses the positive and negative scores to compute the probability of the presence of the pathology in the chest X-ray. The performance of CheXzero is comparable to that of a team of 3 board-certified radiologists across 10 different pathologies. CheXzero also outperforms previous label efficient methods, all of which require a small fraction of the dataset to be manually labeled to enable pathology classification. CheXzero can also perform auxiliary task such as patient gender detection that it was not explicitly trained for. The trained CheXzero model (trained on MIMIC-CXR) also performed well on other chest X-ray datasets such PadChest, showing that the self-supervised approach can generalize well.

International Evaluation of an AI System for Breast Cancer Screening (McKinney et al, 2020)

The paper describes a Deep Learning pipeline which is fed mammogram X-rays taken from 4 standard perspectives and which predicts if the patient would get breast cancer in 2-3 years. Two datasets were used, a larger one from the UK consisting of mammograms from 25k women used for training the model, and a smaller test set from the US for 3k women. The system (for which no code is shared nor any technical information provided) claims that it achieves better performance at breast cancer detection than a team of 6 human radiologists. The model was found to generalize across datasets, since it was trained on UK data and evaluated on US data. When the system was used for screening out initial mammograms for manual verification by a human radiologist (a double-reading scenario), it achieved an 88% increase in throughput. Thus such a system could be useful for providing automated immediate feedback for breast cancer screening, as well as a first step in the double reading scenario, as an assistive tool for human radiologists.

The new era of quantitative cell imaging – challenges and opportunities (Bagheri et al, 2021)

The paper compares the evolving popularity of optical microscopy with the enormous success of genomics a few years earlier, and argues that quantitative optical microscopy has similar potential to make similar contributions to the biomedical community. While the origins of optical microscopy are rooted in the 19th century, recent breakthroughs in this technology (notably high resolution and high throughput light microscopy but others as well), along with advances in deep learning that facilitate human analysis of images at greater scale, indicate that there is significant convergence of approaches that position optical microscopy as a viable candidate for biomedical data science. The idea is that rather than have optical microscopy contribute a small volume of highly curated images to a research project, it would be treated as a computational science where a large quantity of standardized images will be generated over time, and which could then provide insights based on statistical analysis and machine learning. The article then goes on to describe the challenges that the field must overcome, namely standardization of techniques to enable reproducibility within and across different labs, the storage of and FAIR (findable, accessible, interoperable and reusable) access to potentially terabytes of image data data generated. It also describes several initiatives that are happening within the biomedical community to address these challenges.

Data-analysis strategies for image-based cell profiling (Caideco et al, 2017)

This paper highlights strategies and methods to do high throughput quantification of phenotypic differences in cell populations. It can be seen as an extension to the previous paper that outlined the challenges and opportunities in this field. It proposes a workflow composed of the following steps – image analysis, image quality control, preprocessing extracted features, dimensionality reduction, single-cell data aggregation, measuring profile similarity, assay quality assessment and downstream analysis. Image Analysis transforms a population of digital cell images into a matrix of measurements, where each image corresponds to a row in the matrix. This stage often includes illumination correction, segmentation and feature extraction. The Quality Control step consists of computing metrics to detect cell quality using both field of view and cell levels. The Preprocessing step consists of removing outlier features or cells or imputing values for features based on the rest of the population. A notable operation in this stage is plate-level effect correction, which involves addressing edge effects and gradient artifacts across different plates of assays. We also do feature transformation and normalization in this step, such that the features have an approximately normal distribution. The next step is Dimensionality Reduction, where the aim is to retain or consolidate features that provide the most value in answering the biological question being studied. The Single Cell Data Aggregation step consists of using various statistical measures (mean, median, Kolmogorov-Smirnov (KS)) on the feature distribution to create an “average” cell. Clustering or Classification techniques are used to identify sub-populations of cells. The next step is to Measure Profile Similarity that measure and reveal similarities across the different profiles identified. At this point we are ready for the Assay Quality Assessment step where we evaluate the quality of the morphological profiling done during the previous steps. The final step is Downstream Analysis, where the morphological patterns found are interpreted and validated. The paper is extraordinarily detailed and contain many techniques that are suitable not only for image based cell profiling, but feature engineering in general. Data used for illustrating the workflow comes from the BBBC021 (Broad Bio-image Benchmark Collection) image collection of 39.6k image files of 113 small molecules, and author provides example code in the github repo cytomining/cytominer.

Module 4 Week 2

Chapter 10 of Artificial Intelligence in Medical Imaging (Imaging Biomarkers and Imaging Biobanks) (Alberich-Bayarri et al, 2019)

The chapter discusses challenges to the adoption of image analytics into clinical routine. Although efforts are under way to standardize production of imaging biomarkers, they still have a long way to go. In addition, they have to show efficacy in treatment response, which in turn should be confirmed via medical theory, through correlation with disease hallmarks. This allows imaging biomarkers to serve as surrogate indicators to relevant clinical outcomes. Finally, acquiring image biomarkers need to be cost efficient. The chapter covers the general methodology for development, validation and implementation of imaging biomarkers. In order to be effective, such data would then need to be stored in imaging biobanks, either population or disease focused, in order that they can be effectively shared within the community and thus provide maximum value.

Deep Learning-based Computational Pathology Predicts for Cancers of Unknown Primary (Lu et al, 2020)

This paper addresses the problem of predicting the primary site for Cancers of Unknown Primary (CUP) which cannot be determined easily for some patients. Addressing the cancer by generic therapies without determining the source results in low survival. It is possible to find the primary site using extensive diagnostic work-up spanning pathology, radiology, endoscopy, genomics, etc, but such diagnostic procedures are not possible for patients in low resource settings. The paper describes the Tumor Assessment via Deep Learning (TOAD) system that predicts if the cancer is primary or metastasized, and the primary site, based on the histopathology slides (called WSIs). TOAD was trained on 17.5k WSIs and achieved impressive results for top-3 and top-5 accuracy on the test set, and generalizes well with comparable results on WSIs from a different hospital. TOAD uses a CNN architecture which is trained jointly to predict both whether the cancer is primary or metastasized, and the primary site of the cancer (14 classes). For explainability TOAD can generate attention heatmaps to indicate which parts of the slides are indicative of the predicted cancer. TOAD was also tested against WSIs for which the labels were not known initially but were found later, during autopsy. The high accuracies of the top-3 and top-5 predictions means that physicians can narrow the scope of their diagnostic tests and treatments, thus resulting in more efficient use of medical resources. This paper is also covered in the What-Why-How video for the week.

Chapter 13 from Artificial Intelligence in Medical Imaging (Cardiovascular Diseases) (Verjans et al, 2019)

This chapter covers the use and applicability of various medical imaging techniques to diagnose and treat Cardiovascular diseases, such as specialty areas Echocardiography, Computed Tomography (CT), Magnetic Resonance Imaging (MRI) and Nuclear Imaging (PET). It also discusses predictive applications that can combine information from multiple sources, including imaging. The impact of AI in Cardiovascular imaging has so far been mainly in image interpretation and prognosis, it has the potential to impact the entire imaging pipeline – choosing a test per the guidelines, patient scheduling, image acquisition, reconstruction, interpretation and prognosis. Deep Learning techniques have been applied in the MRI space to reconstruct accelerated MR images in favor of compressed sensing, and research efforts show reconstruction of high quality CT images from low radiation noisy images. Deep Learning techniques have also been applied during image post-processing, such as automatically computing ejection fractions or cardiac volumes from CTs. In the near future, we expect that ML applications will generate diagnostics from images. In terms of prognosis, DL/ML approaches using medical imaging is expected to increase the quality of healthcare by detecting problems faster and cheaper. There also exists the scope of combining insights from medical imaging with other sources of information such as generic or social factors, to make better medical decisions. The chapter continues with a discussion of specific practical uses of AI in different cardiovascular imaging scenarios in each of the specialty areas listed above. The chapter also discusses the Vendor Neutral AI Platform (VNAP) to help with rapid adoption of AI based solutions in Medical Imaging.

Artificial Intelligence in Digital Pathology – new tools for diagnosis and precision oncology (Bera et al, 2019)

The paper describes how the digitizing of whole-slide images (WSI) of tissue has led to the rise of AI / ML tools in digital pathology, that can assist pathologists and oncologists provide better and more timely treatment. The rise of Deep Learning and computation power over the last two decades has given rise to many different applications in these areas. For pathologists, the primary applications are the identification of dominant morphological patterns that are indicative of certain diseases, and for oncologists, it is the identification of biomarkers that are indicative of a type of cancer and the stage it is in. These are both complex tasks and have high variability, so it usually takes years of specialization to do effectively. AI based approaches are robust and reproducible, and achieve a similar level of accuracy as human experts. When used in tandem, it can significantly cut down the human expert’s workload and make them more efficient, or serve as a confirmation (like a second opinion). These AI applications have been used in diagnostic applications such as differentiating between WSIs of malignant vs benign breast cancer tissue, and prognostic applications such as the ability to detect tumor infiltrating lymphocytes, which are indicative of 13 different cancers, or the ability to predict recurrence of lung cancer by the arrangement of cells in WSIs. It has also been used in Drug discovery and development, by identifying patients who are more likely to respond to certain treatments using WSIs of their nuclear or peri-nuclear features. DL architectures typically used in these applications are the CNN, FCN (sparse features, e.g. detecting cancerous regions in histopathology images), RNNs (to predict risk of disease recurrence over time), GAN (segment out specific features from histopathology images, conversion of one form of tissue staining to another, etc). Challenges to clinical adoption of these techniques include regulatory roadblocks, quality and availability of training data, the interpretability of these AI models, and the need to validate these models sufficiently before use.

Data-efficient and weakly supervised computational pathology on while-slide images (Lu et al, 2021)

The paper describes an attention mechanism called Clustering-constrained Attention Multi Instance learning (CLAM) which is used to identify regions of interest (ROI) in while slide images (WSI). WSIs are plentiful but are labeled with slide level labels, which are not as effective for classification tasks as manually labeled ROIs. CLAM allows an attention mechanism to be applied across all pixels and is very effective at finding ROIs which can then be extracted and used for various tasks, and has proven to be more effective than treating all pixels in the slide as having the same label. CLAM has been applied to the tasks of detecting renal cell carcinoma, non-small-cell lung cancer and lymph node metastasis and has been shown to achieve high performance with a systematically decreasing number of training labels. CLAM can also produce interpretable heatmaps that allow the pathologist to visualize the regions of tissue that contributed to a positive prediction. CLAM can also be used to compute slide level feature representations that are more predictive than raw pixel values. CLAM has been tested with independent test cohorts and found to generalize across data specific variants, including smartphone microscopy images. Weakly supervised approaches such as CLAM are important because it leverages abundant weak WSI labels to provide labeled ROIs of slide subregions, which in turn can produce more accurate predictive models of computational pathology.

That's all I have for today. I hope you found this useful. In my next review, I will review the paper readings for Module 6 (Therapeutic Science).