Saturday, June 24, 2023

BMI 702 Review Part IV -- Biomedical Imaging

Here is Part IV of my ongoing review of the Biomedical Artificial Intelligence (BMI 702) course, part of Harvard's Foundation of Biomedical Informatics 2023 Spring session, taught by Prof Marinka Zitnik and her team. If you want to check out my previous reviews in this series, they are listed below.

This review covers Module 5 of the course (weeks 10 and 11) and is devoted to the use of Computer Vision techniques to address Biomedical Imaging use cases. There are 9 papers and 2 book chapters, 6 in the first week and 5 in the second. I have some interest in Computer Vision models, having built an Image Classifier by fine-tuning a ResNet pre-trained on ImageNet to predict the type of medical image (radiography, pathology, etc) in medical text, and more recently, fine-tuning an OpenAI CLIP model on medical image and caption pairs to provide text-to-image and image-to-image search capabilities. However, all of these papers have a distinctly medical flavor, i.e. these directly address the needs of doctors, radiologists and pathologists in their day to day work, using data that is typically only found in hospital settings. While a large number of these papers deal with supervised learning, some use semi-supervised or weakly-supervised strategies, which require some adaptation of already available data, which in turn would require you to know about existence of said data to come up with the idea. But I thought they were very interesting in a "broaden my horizons" kind of way.

Module 5 Week 1

Dermatologist-level classification of skin cancer with deep neural networks (Esteva et al, 2017)

This is one of many landmark events where a neural network achieves superhuman performance at a particular task – in this case, classifying a variety of skin cancers from smart phone photos of lesions. It is also covered in the What-Why-How video for this week. The paper itself is paywalled, and Google Scholar only finds presentation slides by the primary author for a GPU Tech 2017 conference. The paper describes an experiment where a GoogleNet Inception V3 CNN, pre-trained on ImageNet data, was further fine-tuned on 129,450 clinical images of skin lesions spanning 2,032 different diseases. The diseases were further classified into a hierarchy via a taxonomy. Classifiers were constructed to predict one of 3 disease classes (first level nodes of the taxonomy – benign, malignant and non-neoplastic) and one of 9 disease classes (second level nodes), and their outputs compared to that of a human expert on a sample of the dataset. In both cases, the trained classifier out-performed the humans. Later experiments with larger number of disease classes and biopsy-proven labels, performed even better, the AUC for the sensitivity-specificity curve was 0.96. The performance of the CNN to predict Melanoma (with photos and dermascopy) and Carcinoma was then compared with predictions of 21 board certified dermatologists and was found to beat their performance on average. Finally, to test the classifier encodings, the last hidden layer of the CNN was reduced to two dimensions using T-SNE and found to cluster well across four disease categories, as well as for individual diseases within each category. In addition to the good results obtained, the paper is important in that it demonstrates an approach to detect skin cancer cheaply and effectively compared to previous approaches (dermascopy and biopsy), thereby saving many people from death and suffering.

Toward robust mammography based models for breast cancer risk (Yala et al, 2021)

This paper describes the Mirai model to predict the risk of breast cancer at multiple timepoints (1-5 years), using mammogram images (4 standard perspectives) and optionally, additional non-image risk factors such as age and hormonal factors. If the additional risk factors are not provided, Mirai predicts them from the aggregated vector representation of the mammograms. The risk factors (predicted or actual) along with the mammogram vector to predict the risk of breast cancer. Mirai used data collected by Massachusetts General Hospital (MGH), representing approximately 26k exams, splitting it 80/10/10 for training, validation and testing. The resulting model was tested against established risk models such as Tyrer-Cuzik v8 (TCv8) and other SOTA image based neural models with and without additional risk factors. The latter models were also trained on the MGH data. Mirai was found to outperform them using the C-index (a measure of concordance between label and prediction) and AUC at 1-5 year intervals as evaluation metrics. The model was then evaluated against 19k and 13k exams from the Karolinska Institute (Sweden) and CGMH (Taiwan) respectively and had comparable performance on both. It was also tested on ethnic subgroups and was found to compare equally well across all groups. It also outperformed the industry standard risk models at identifying high risk cohorts. The paper concludes by saying that Mirai could be used to provide more sensitive screening and achieve earlier detection for patients who will develop breast cancer, while reducing unnecessary screening and over-treatment for the rest.

Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning (Tiu et al, 2022)

This paper describes training a multi-modal CLIP model CheXzero, that learns an embedding using 377k chest X-rays and their corresponding raw radiology report from the MIMIC-CXR dataset, which is then used to predict pathologies (indications of different diseases) of the lung for unseen chest X-rays. This is done by generating positive and negative prompts for each pathology of interest. The model uses the positive and negative scores to compute the probability of the presence of the pathology in the chest X-ray. The performance of CheXzero is comparable to that of a team of 3 board-certified radiologists across 10 different pathologies. CheXzero also outperforms previous label efficient methods, all of which require a small fraction of the dataset to be manually labeled to enable pathology classification. CheXzero can also perform auxiliary task such as patient gender detection that it was not explicitly trained for. The trained CheXzero model (trained on MIMIC-CXR) also performed well on other chest X-ray datasets such PadChest, showing that the self-supervised approach can generalize well.

International Evaluation of an AI System for Breast Cancer Screening (McKinney et al, 2020)

The paper describes a Deep Learning pipeline which is fed mammogram X-rays taken from 4 standard perspectives and which predicts if the patient would get breast cancer in 2-3 years. Two datasets were used, a larger one from the UK consisting of mammograms from 25k women used for training the model, and a smaller test set from the US for 3k women. The system (for which no code is shared nor any technical information provided) claims that it achieves better performance at breast cancer detection than a team of 6 human radiologists. The model was found to generalize across datasets, since it was trained on UK data and evaluated on US data. When the system was used for screening out initial mammograms for manual verification by a human radiologist (a double-reading scenario), it achieved an 88% increase in throughput. Thus such a system could be useful for providing automated immediate feedback for breast cancer screening, as well as a first step in the double reading scenario, as an assistive tool for human radiologists.

The new era of quantitative cell imaging – challenges and opportunities (Bagheri et al, 2021)

The paper compares the evolving popularity of optical microscopy with the enormous success of genomics a few years earlier, and argues that quantitative optical microscopy has similar potential to make similar contributions to the biomedical community. While the origins of optical microscopy are rooted in the 19th century, recent breakthroughs in this technology (notably high resolution and high throughput light microscopy but others as well), along with advances in deep learning that facilitate human analysis of images at greater scale, indicate that there is significant convergence of approaches that position optical microscopy as a viable candidate for biomedical data science. The idea is that rather than have optical microscopy contribute a small volume of highly curated images to a research project, it would be treated as a computational science where a large quantity of standardized images will be generated over time, and which could then provide insights based on statistical analysis and machine learning. The article then goes on to describe the challenges that the field must overcome, namely standardization of techniques to enable reproducibility within and across different labs, the storage of and FAIR (findable, accessible, interoperable and reusable) access to potentially terabytes of image data data generated. It also describes several initiatives that are happening within the biomedical community to address these challenges.

Data-analysis strategies for image-based cell profiling (Caideco et al, 2017)

This paper highlights strategies and methods to do high throughput quantification of phenotypic differences in cell populations. It can be seen as an extension to the previous paper that outlined the challenges and opportunities in this field. It proposes a workflow composed of the following steps – image analysis, image quality control, preprocessing extracted features, dimensionality reduction, single-cell data aggregation, measuring profile similarity, assay quality assessment and downstream analysis. Image Analysis transforms a population of digital cell images into a matrix of measurements, where each image corresponds to a row in the matrix. This stage often includes illumination correction, segmentation and feature extraction. The Quality Control step consists of computing metrics to detect cell quality using both field of view and cell levels. The Preprocessing step consists of removing outlier features or cells or imputing values for features based on the rest of the population. A notable operation in this stage is plate-level effect correction, which involves addressing edge effects and gradient artifacts across different plates of assays. We also do feature transformation and normalization in this step, such that the features have an approximately normal distribution. The next step is Dimensionality Reduction, where the aim is to retain or consolidate features that provide the most value in answering the biological question being studied. The Single Cell Data Aggregation step consists of using various statistical measures (mean, median, Kolmogorov-Smirnov (KS)) on the feature distribution to create an “average” cell. Clustering or Classification techniques are used to identify sub-populations of cells. The next step is to Measure Profile Similarity that measure and reveal similarities across the different profiles identified. At this point we are ready for the Assay Quality Assessment step where we evaluate the quality of the morphological profiling done during the previous steps. The final step is Downstream Analysis, where the morphological patterns found are interpreted and validated. The paper is extraordinarily detailed and contain many techniques that are suitable not only for image based cell profiling, but feature engineering in general. Data used for illustrating the workflow comes from the BBBC021 (Broad Bio-image Benchmark Collection) image collection of 39.6k image files of 113 small molecules, and author provides example code in the github repo cytomining/cytominer.

Module 4 Week 2

Chapter 10 of Artificial Intelligence in Medical Imaging (Imaging Biomarkers and Imaging Biobanks) (Alberich-Bayarri et al, 2019)

The chapter discusses challenges to the adoption of image analytics into clinical routine. Although efforts are under way to standardize production of imaging biomarkers, they still have a long way to go. In addition, they have to show efficacy in treatment response, which in turn should be confirmed via medical theory, through correlation with disease hallmarks. This allows imaging biomarkers to serve as surrogate indicators to relevant clinical outcomes. Finally, acquiring image biomarkers need to be cost efficient. The chapter covers the general methodology for development, validation and implementation of imaging biomarkers. In order to be effective, such data would then need to be stored in imaging biobanks, either population or disease focused, in order that they can be effectively shared within the community and thus provide maximum value.

Deep Learning-based Computational Pathology Predicts for Cancers of Unknown Primary (Lu et al, 2020)

This paper addresses the problem of predicting the primary site for Cancers of Unknown Primary (CUP) which cannot be determined easily for some patients. Addressing the cancer by generic therapies without determining the source results in low survival. It is possible to find the primary site using extensive diagnostic work-up spanning pathology, radiology, endoscopy, genomics, etc, but such diagnostic procedures are not possible for patients in low resource settings. The paper describes the Tumor Assessment via Deep Learning (TOAD) system that predicts if the cancer is primary or metastasized, and the primary site, based on the histopathology slides (called WSIs). TOAD was trained on 17.5k WSIs and achieved impressive results for top-3 and top-5 accuracy on the test set, and generalizes well with comparable results on WSIs from a different hospital. TOAD uses a CNN architecture which is trained jointly to predict both whether the cancer is primary or metastasized, and the primary site of the cancer (14 classes). For explainability TOAD can generate attention heatmaps to indicate which parts of the slides are indicative of the predicted cancer. TOAD was also tested against WSIs for which the labels were not known initially but were found later, during autopsy. The high accuracies of the top-3 and top-5 predictions means that physicians can narrow the scope of their diagnostic tests and treatments, thus resulting in more efficient use of medical resources. This paper is also covered in the What-Why-How video for the week.

Chapter 13 from Artificial Intelligence in Medical Imaging (Cardiovascular Diseases) (Verjans et al, 2019)

This chapter covers the use and applicability of various medical imaging techniques to diagnose and treat Cardiovascular diseases, such as specialty areas Echocardiography, Computed Tomography (CT), Magnetic Resonance Imaging (MRI) and Nuclear Imaging (PET). It also discusses predictive applications that can combine information from multiple sources, including imaging. The impact of AI in Cardiovascular imaging has so far been mainly in image interpretation and prognosis, it has the potential to impact the entire imaging pipeline – choosing a test per the guidelines, patient scheduling, image acquisition, reconstruction, interpretation and prognosis. Deep Learning techniques have been applied in the MRI space to reconstruct accelerated MR images in favor of compressed sensing, and research efforts show reconstruction of high quality CT images from low radiation noisy images. Deep Learning techniques have also been applied during image post-processing, such as automatically computing ejection fractions or cardiac volumes from CTs. In the near future, we expect that ML applications will generate diagnostics from images. In terms of prognosis, DL/ML approaches using medical imaging is expected to increase the quality of healthcare by detecting problems faster and cheaper. There also exists the scope of combining insights from medical imaging with other sources of information such as generic or social factors, to make better medical decisions. The chapter continues with a discussion of specific practical uses of AI in different cardiovascular imaging scenarios in each of the specialty areas listed above. The chapter also discusses the Vendor Neutral AI Platform (VNAP) to help with rapid adoption of AI based solutions in Medical Imaging.

Artificial Intelligence in Digital Pathology – new tools for diagnosis and precision oncology (Bera et al, 2019)

The paper describes how the digitizing of whole-slide images (WSI) of tissue has led to the rise of AI / ML tools in digital pathology, that can assist pathologists and oncologists provide better and more timely treatment. The rise of Deep Learning and computation power over the last two decades has given rise to many different applications in these areas. For pathologists, the primary applications are the identification of dominant morphological patterns that are indicative of certain diseases, and for oncologists, it is the identification of biomarkers that are indicative of a type of cancer and the stage it is in. These are both complex tasks and have high variability, so it usually takes years of specialization to do effectively. AI based approaches are robust and reproducible, and achieve a similar level of accuracy as human experts. When used in tandem, it can significantly cut down the human expert’s workload and make them more efficient, or serve as a confirmation (like a second opinion). These AI applications have been used in diagnostic applications such as differentiating between WSIs of malignant vs benign breast cancer tissue, and prognostic applications such as the ability to detect tumor infiltrating lymphocytes, which are indicative of 13 different cancers, or the ability to predict recurrence of lung cancer by the arrangement of cells in WSIs. It has also been used in Drug discovery and development, by identifying patients who are more likely to respond to certain treatments using WSIs of their nuclear or peri-nuclear features. DL architectures typically used in these applications are the CNN, FCN (sparse features, e.g. detecting cancerous regions in histopathology images), RNNs (to predict risk of disease recurrence over time), GAN (segment out specific features from histopathology images, conversion of one form of tissue staining to another, etc). Challenges to clinical adoption of these techniques include regulatory roadblocks, quality and availability of training data, the interpretability of these AI models, and the need to validate these models sufficiently before use.

Data-efficient and weakly supervised computational pathology on while-slide images (Lu et al, 2021)

The paper describes an attention mechanism called Clustering-constrained Attention Multi Instance learning (CLAM) which is used to identify regions of interest (ROI) in while slide images (WSI). WSIs are plentiful but are labeled with slide level labels, which are not as effective for classification tasks as manually labeled ROIs. CLAM allows an attention mechanism to be applied across all pixels and is very effective at finding ROIs which can then be extracted and used for various tasks, and has proven to be more effective than treating all pixels in the slide as having the same label. CLAM has been applied to the tasks of detecting renal cell carcinoma, non-small-cell lung cancer and lymph node metastasis and has been shown to achieve high performance with a systematically decreasing number of training labels. CLAM can also produce interpretable heatmaps that allow the pathologist to visualize the regions of tissue that contributed to a positive prediction. CLAM can also be used to compute slide level feature representations that are more predictive than raw pixel values. CLAM has been tested with independent test cohorts and found to generalize across data specific variants, including smartphone microscopy images. Weakly supervised approaches such as CLAM are important because it leverages abundant weak WSI labels to provide labeled ROIs of slide subregions, which in turn can produce more accurate predictive models of computational pathology.

That's all I have for today. I hope you found this useful. In my next review, I will review the paper readings for Module 6 (Therapeutic Science).

Friday, June 09, 2023

Future of Data Centric AI -- Trip Report

I attended the Future of Data Centric AI 2023 this week, a free virtual conference organized by Snorkel AI. Snorkel.AI is a company built around the open-source Snorkel framework for programmatic data labeling. The project originally started at Stanford University's Hazy Research group, and many (all?) of the company's founders and some engineers are from the original research team. Snorkel.AI has been building and improving their flagship product, Snorkel Flow, an integrated tool for iterative data labeling and model building, so there were some presentations centered around that. In addition, its 2023, the year of generative LLMs (or GoLLuMs or Foundation Models) so Snorkel's ability to interface with these Foundation Models (FMs) also featured prominently. Maybe its a Stanford thing but presenters seem to prefer calling them FMs, so I will do the same, if only to distinguish them from the BERT / BART style large language models (LLMs).

If you are unfamiliar with what Snorkel does, I recommend checking out Snorkel and the Dawn of Weakly Supervised Machine Learning (Ratner et al, 2017) for a high-level understanding. For those familiar with the original open source Snorkel (and Snorkel METAL), Snorkel Flow is primarily a no-code web based tool to support the complete life-cycle of programmatic data labeling and model development. Because it is no-code it is usable by domain experts who don't necessarily know how to program. While the suite of built-in no-code Label Function (LF) templates are quite extensive, it supports adding programmatic LFs as well if you need them. In addition, it provides various conveniences such as cold-start LF recommendations and error analysis and recipes on how to address various classes of error to support an iterative approach to do model development almost like a programmer's edit-compile-run cycle. Over the last few months, they have added LLMs as another source of weak supervision and a possible source of LFs as well.

The last bit is important, because I think it points to the pragmatism of the Snorkel team. The FM applications ecosystem currently seems filled with pipelines that feature the FM front and center, i.e. use the FM for everything it can possibly do. Given their high infrastructure costs to run them and their high latencies, these pipelines don't seem very practical. Most of us were taught to cache (or pre-cache) as much as possible, so the customer does not pay the price during serving, or they will soon cease to be customers. Matthew Honnibal, creator of Spacy, makes a similar, though probably better argued, point in his Against LLM Maximalism blog post, where he advocates for smaller, more reliable, models for most tasks in the pipeline, and reserving the FM for tasks that truly need its capabilities. Snorkel Flow goes one step further by taking them out of the pipeline altogether -- instead using them to help generate good labels, thus benefiting from the FMs world-knowledge while still retaining the flexibility, reliability and explainability in the generated models.

However, Snorkel.AI is addressing the needs of the FM market as well, through their soon to be announced new tools -- Foundry and GenFlow -- which Alex Ratner (CEO and co-founder of Snorkel.AI) mentioned in his keynote addresses. They classify the usage of FMs into four stages -- pre-training (either from scratch or from trained weights, where it becomes more of a domain adaptation exercise), instruction tuning for behavior, fine tuning for a particular task, and distillation of the model into a smaller, more easily deployable model. As the DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining (Xie et al, 2023) paper shows, the mix of data used to train or adapt the FM can make a significant impact upon its quality, and Foundry and GenFlow are aimed at improving data and prompt quality for the first and second stages respectively, by ensuring optimum sampling, filtering and ranking.

Over the course of the presentation, presenters repeatedly talked about the importance of having high quality data to train models. Not surprising, since the conference has "Data-Centric AI" in its name, a term coined by Andrew Ng who was the first to emphasize this idea. However, the Snorkel team have really taken this idea to heart, and along with their customers, have developed some really cool applications, some of which they showcased in this conference. Apart from the keynotes and some panel discussions, presentations were in two parallel tracks, and I chose the ones that emphasized practice over theory, and I skipped a few, so the list below may be slightly biased. Videos of the talks will become available on the Snorkel Youtube channel in about a month, I will update the links once that happens (if I remember).

  • Bridging the Last Mile: Applying Foundation Models with Data-Centric AI (Alex Ratner) -- basic idea is that FMs are analogous to generalists that (think they) know lots of things, but for specific tasks they need to be trained to do well. Alex envisions data scientists of the future that are less machine learning experts and more domain and product experts. Alex's talks contain many interesting observations, too numerous to list here, and its just the right mixture of academic and practical for lay people such as myself.
  • Fireside Chat: building Bloomberg GPT (Gideon Mann and Alex Ratner) -- interesting insights into the rationale for Bloomberg GPT and the work that went into building it.
  • Fireside Chat: Stable Diffusion and Generative AI (Emad Mostaque and Alex Ratner) -- lot of cool technical insights about FMs from Emad Mostaque, CEO of Stability.AI (Stable Diffusion).
  • A Practical Guide to Data Centric AI -- A Conversational Use AI Use case (Daniel Lieb and Samira Shaikh) -- practical tips to building an intent classifier for conversational chatbots. Similarity function for clustering conversations was adapted from the paper Modeling Semantic Containment and Exclusion in Natural Language Inference (MacCartney and Manning, 2008).
  • The Future is Neurosymbolic (Yoav Shoham) -- somewhat philosophical discussion of why FMs can never do the kind of things humans can do, and why, from the founder of AI21 Labs.
  • Generating Synthetic Tabular Data that is Differentially Private (Lipika Ramaswamy) -- a somewhat technical discussion arguing for differential privacy to generate synthetic datasets that could be used to train FMs and thereby address the problem of them memorizing sensitive training data.
  • DataComp: Significance of Data for Multimodal AI (Ludwig Schmidt) -- discusses DATACOMP, a benchmark which aims to improve an image-text dataset used to train multi-modal models such as CLIP, by keeping the model fixed and improving the dataset. By applying a simple quality filter on the original dataset, they were able to model that was smaller in size, took 7x less time to train, and outperformed a larger model. More details in the DATACOMP: In search of the next generation of multimodal datasets (Gadre et al, 2023) paper.
  • New Introductions from Snorkel AI (Alex Ratner) -- second day keynote where Alex formally announced Snorkel Foundry and GenFlow, among other things, some of which were repeats from the previous day's keynote.
  • Transforming the Customer Experience with AI: Wayfair's Data Centric Way (Archana Sapkota and Vinny DeGenova) -- this was a really cool presentation, showing how they labeled their product images programatically with Snorkel for design, pattern, shape and theme, and used that to fine tune a CLIP model, which they now use in their search pipeline. More info about this work in this blog post.
  • Tackling advanced classification with Snorkel Flow (Angela Fox and Vincent Chen) -- the two big use cases where people leverage Snorkel are document classification and sequence labeling. Here they discuss several strategies for multi-label and single-label document classification.
  • Accelerating information extraction with data-centric iteration (John Smardijan and Vincent Chen) -- this presentation has a demo of Snorkel flow to label documents with keywords for a specific use case (for which off the shelf NERs do not exist). The demo shows how one can rapidly reach a good score (precision and coverage) by iterating through creating and applying an LF, then training and evaluating a model on the labels created by the LF, doing error analysis to correct the issues pointed out by creating another LF, etc, until the desired metrics are reached. They called this the Data-Model flywheel.
  • Applying Weak Supervision and Foundation Models for Computer Vision (Ravi Teja Mullapudi) -- talked about using Snorkel for image classification, including a really cool demo of Snorkel Periscope (an internal Labs tool) applied to satellite data to build classifiers that look for images of a particular type, using UMAP visualizations and cosine similarity distributions.
  • Leveraging Data-Centric AI for Document Intelligence and PDF Extraction (Ashwini Ramamoorthy) -- a talk about information extraction from PDF documents, similar to the one listed earlier, but as with that one, Ashwini shares a huge amount of practical information that I found very useful.
  • Leveraging Foundation Models and LLMs for Enterprise Grade NLP (Kristina Lipchin) -- slightly high level but very interesting take on FMs from a product manager viewpoint, echoes much of the same ideas about last mile handling covered in earlier talks, but identifies Domain Adaptation and Distillation as the primary use cases for most organizations.
  • Lessons from a year with Snorkel Data-Centric with SMEs and Georgetown (James Dunham) -- this is a hugely informative talk about Georgetown University's experience with using Snorkel Flow for a year. Not only did their domain experts adapt to it readily and love the experience, both data scientists and domain experts benefited from it. Some major benefits noted are the ability to ramp up labeling efforts faster and with less risk, since it is easier to iterate on labels (adding/removing/merging classes, etc) as your understanding of the data grows, the ability to fail fast and without too much sunk cost, and overall lowering of project risk. If you are contemplating purchasing a Snorkel Flow subscription, this talk provides lots of useful information.
  • Fireside chat: building RedPajamas (Ce Zheng and Braden Hancock) -- RedPajama is an open source initiative to produce a clean-room reimplementation of the popular LLaMA FM from Meta. The focus is on replicating their dataset recipe carefully, but using open source documents, and training base and instruction tuned versions of the LLaMMA model on this data that does not block commercial adoption. Ce is the head of Together Computer the company behind RedPajama, and Braden and Ce discuss the work that has been done so far in this project.

In many cases, it is not the lack of data, but a lack of labeled data that is the major hurdle to Machine Learning adoption within a company. Snorkel's support for weak supervision provides a practical path to generate labels using a programmatic approach. As someone who came to Machine Learning from Search, where featurization is basically TF-IDF and more lately using a trained tokenizer to feed a neural model, I was initially not particularly skilled at detecting features from data. However, over time, as I started looking at data, initially for error analysis and later for feature extraction in cases where labels were not available apriori, the process has become easier, so hopefully my next experience with Snorkel will be smoother. Furthermore, Snorkel's focus on FMs also provides a path to harness this powerful new resource as an additional source of weak supervision.