Tuesday, May 07, 2024

KGC/HCLS 2024 Trip Report

I was at KGC (Knowledge Graph Conference) 2024, which is happening May 6-10 at Cornell Tech. I was presenting (virtually) at their Health Care and Life Sciences (HCLS) workshop, so my speakers pass was only valid for today for the HCLS portion of KGC. My trip report covers a few talks that I attended here. Attending virtually was a bit chaotic as sessions went over sometimes, so you might leave a session to attend another, only to find that it hadn’t started yet. This is hard to forsee, we have faced this issue ourselves the first time we moved an internal conference from in-person to hybrid.

KGs in RAG (Tom Smoker, WhatWhyHow.AI)

I have been working with Large Language Models (LLMs) and Retrieval Augmented Generation (RAG) for almost a year now, and I went to this talk hoping for insights on how to use graphs as input to RAG systems. Somewhat predictably, the speaker spent some time covering the basics, which I personally did not find very fruitful. However, there were some nuggets of wisdom I got out of the talk. First, the RAG pipelines can lower the risk of hallucinations by using LLMs for planning and reasoning, but without delegating to LLMs for factual information. And second, an agent architecture can more efficiently use smaller sub-graphs which can often be generated dynamically in Closed World models.

A side discussion on chat also yielded a paper reference Getting from Generative AI to Trustworthy AI: what LLMs may learn from Cyc (Lenat and Marcus, 2023). The paper looks really interesting on an initial skim and I plan to read in more detail later.

Knowledge Graphs for Precision Oncology (Krishna Bulusu, AstraZeneca)

A nice overview of applications of Knowledge Graph (KG) to Drug Discovery (DD). DD attempts to apply KG to solve 3 main problems: (1) find gene causing disease (2) match drug with disease and (3) (drug, gene, disease) as a fundamental relationship in DD. The speaker pointed out that the big advantage of KGs is Explainability. He also mentioned the use of graph clustering for node stratification.

Combining graph and vector representation for efficient information retrieval (Peio Popov, Ontotext)

This was a presentation from OntoText where they demonstrated new features built into their GraphDB database. This was of interest to me personally since our KG is also built using GraphDB. Specifically they have integrated LLM and vector search support into their products so they can be invoked from a SPARQL query. This gives GraphDB users the power to combine these techniques in the same call rather than build multi-stage pipelines.

I also learned the distinction between Semantic, Full text and Vector Search as ones based off KG, Lucene (or Lucene-like) indexes and vector search platforms, I would previously conflate the first and third.

Knowledge Engineering in Clinical Decision Support: When a Graph Representational Model is not enough (Maulik Kamdar, Optum)

This was a presentation from my ex-colleague Maulik Kamdar. He talks about challenges in Clinical Decision Support (CDS) where a KG alone is insufficient. Specifically the case he is considering where multiple third party ontologies need to be aligned into one KG. In this situation, similar concepts are combined into ValueSets, which are then composed with bare concepts or with each other to form Clinical Rules. Clinical Rules are further combined to form Clinical Calculators or Questionnaires, which are then combined to form Decision Trees and Flowcharts, which are then combined into Clinical Guidelines. I am probably biased given our common history, but I found this talk to be the most educational for me.

Knowledge Graphs, Theorem Provers and Language Models (Vijay Saraswat and Nikolaos Vasiloglou)

The speakers discussed the role of self-discovery, In-Context Learning (ICL), symbiotic integration of KG with search, and Graph RAG in reasoning engines powered by KG and LLM. They characterize an Agent as an LLM based black box that is provided with pairs of input-output instances to learn some unknown function (similar to ML models). They describe ICL as learning through few shot and many shot examples. They also talk about using the output of KG to fact-check / enhance LLMs and using LLMs to generate assertions that can be used to create a KG. Their demo shows how an LLM is able to learn to generate a Datalog like graph query language from text prompts using few-shot examples.

The speaker made reference to the following three papers in support of the techniques he was describing, which I have duly added to my reading list.

A Scalable and Robust Named Entity Recognition and Linking System for a Clinical Healthcare Knowledge Graph (Sujit Pal, Elsevier Health)

This was my talk. I had originally intended to attend in person but it seemed wasteful to fly across the country to deliver a 5-minute presentation. It did take a bit of planning to present remotely but I learned two useful life lessons.

  1. You can generate a presentation video from MS Powerpoint. Simply create your slides and record a slideshow where you record yourself narrating your presentation. Once done, export as an MP4 and upload to Youtube or other video service.
  2. You can print posters online and have them delivered to someone else.

Huge thanks to my colleague Tom Woodcock who attended in person, and who was kind enough to carry and hang my poster at the conference for me, and who also agreed to present my slideshow for me (although I think that in the end he did not have to). Many thanks also to my ex-colleague Helena Deus (part of the HCLS organizing team), who helped walk me through to a workable solution and was instrumental in my talk being delivered successfully. Also thanks to Leah Walton from the HCLS organizing team, for supporting me in my attempt to present remotely.

Here is the Youtube video for my 5-minute presentation in case you are interested. It’s a bit high-level since I had only 5 minutes to cover everything, but there is a little more information in the poster below.

Graphs for good – Hypothesis generation for Rare Disease Treatment (Brian Martin, AbbVie)

This presentation revolves around a graph that connects diseases to drugs via disease variants, gene, pathway, gene and compound entities. This was used to find a cure for a rare disease using existing medications. It was later extended to find candidate cures for a group of 20 most neglected diseases worldwide. The speakers verified that results for Dengue fever correlates well with previously known information, thus supporting the veracity of the approach. The paper describing this work is Leveraging a Billion-Edge Knowledge Graph for Drug Re-purposing and Target Prioritization using Genomically-Informed Subgraphs (Martin et al, 2022).

Generating and Querying Graphs with LLM (Brian Martin, Subha Madhavan, Berenice Wulbrecht)

Panel discussion where various strategies for generating and querying graphs using LLMs were discussed. Entertaining (and somewhat predictable) comparisons of Property Graphs vs RDF graphs to Ford and Ferrari automobiles, and how LLMs transform them into Teslas (with its self-driving technology). They also talk about extracting assertions from a corpus of documents to create a KG customized for the corpus, and then using the KG to fact-check the output of the LLM for RAG queries against that corpus.

Overall, I think it was a great conference. Learned a lot, would love to go back and present here in the future, hopefully this time in person.