Saturday, April 29, 2023

Haystack US 2023: Trip Report

I attended the Haystack US 2023 Search Relevance conference last week. It was a great opportunity to share ideas and techniques around search and search relevance, as well as to catch up with old friends and acquaintances and a chance to make new ones. I was there only for the two days of the actual conference, but there were events before and after the conference as well. The full talk schedule can be found here. The conference was in two tracks and took place at the Violet Crown movie theater in Charlottseville VA. The mall it is in also has a bunch of nice eateries, so if you are a foodie like me, then this may be a chance to expand your gastronomic domain as well. This is the US version; since the last couple of years, they have two Haystack search relevance conferences per year, one in the US and another one in Europe. In this post, I will describe very briefly the talks I attended, with links to the actual abstracts on the Haystack site. The Haystack team is working on releasing the slides and videos, you can find more information on the Relevancy Slack Channel.

Day 1

Opening Keynote

Keynote is titled Relevance in an age of Generative Search and delivered by Trey Grainger. Trey is the main author of AI Powered Search, along with co-authors Doug Turnbull and Max Irwin, a book that has become popular in the search community as the discipline moves to embrace vector search to provide more relevant results for search and recommendation. He talked about the changes in search industry in the context of his book, then mentioned ChatGPT and some popular applications of generative AI, such as search summaries and document exploration.

Metarank

Learning to hybrid search: combining BM25, neural embeddings and customer behavior into an ultimate ranking ensemble was a presentation by the author of Metarank Roman Grebenikkov. He makes the point that lexical (BM25) search is good at a few things and neural search is good at a few other things. Therefore combining the two (or more) searches as an ensemble can address the weaknesses of both systems and improve results. Metarank was used to evaluate this idea using various ensembles of techniques.

Querysets and Offline Evaluation

The Creating Representative Query Sets for Offline Evaluation talk by Karel Bergman deals with the question of how many queries to sample to evaluate an application via offline evaluation so as to achieve the required confidence level. This step is important because it allows us to predict the minimum dataset size using which we can be confident about our results.

Relevant Search at Scale

This talk about Breaking Search Performance Limits with Domain-Specific Computing was delivered by Ohad Levy of Hyperspace, which manufactures a FPGA device that provides functionality similar to a (vector enabled) ElasticSearch instance. He makes the point that in a tradeoff between performance, cost and relevance, one can usually have only 1 or 2 out of 3, and that lower latency implies better customer engagement and hence increased revenue. Their search solution offers an ElasticSearch like JSON API as well as a more Pythonic object-oriented API through which users interact with the device.

EBSCO Case Study

The EBSCO case study Vector Search for Clinical Decisions presentation by Erica Lesyshyn and Max Irwin has a lot of parallels with the search engine platform I work with (ClinicalKey). Like us, they are backed by an ontology is was developed initially using the Unified Medical Language System (UMLS) and additional structures built around that using additional ontologies or internal domain knowledge. They also have a similar concept search platform on top of which they are running various products. They partition their query into 3 intents – simple, specific and complex. Simple is similar to 1 or 2 concept searches and corresponds to their head, the specific ones are simple but qualified so can be handled with BM25 based tricks and their complex is longer queries. Their presentation described how they fixed their bad search performance on their tail queries using vector search, encoding their query and documents using an off-the-shelf Large Language Model (LLM) and doing Approximate Nearest Neighbor (ANN) search using QDrant, a Rust based vector search engine. To serve the model, Max built Mighty a Rust based inference server that packages their embedding model into ONNX and serves it over HTTP. Because Mighty compiles the service down to executable code, there are no (Python / Rust) dependencies and thus very fast and easy to deploy.

Lightning Talks

There were a series of shorter talks in the Lightning Talks section. I did take notes throughout the conference, as well as these talks, but since they were short, it was hard to take adequate notes, so some of what follows is from memory. If you wish to correct them (or indeed, any part of my trip report) please drop me a comment.

Filtered Vector Search – vector search can be difficult to threshold, so suggestion here is to use common-sense facets to build the appropriate thresholds. Another suggestion is to cache vector output for common / repeated queries so model gets invoked only for new queries.

Using search relevance with Observability – advocates for dashboards that extract aggregation metrics from queries that can help with decision making around search relevance

Doug Turnbull came up with the idea for a website nextsearchjob.com to help connect search / search-ML engineers with employers based on the jobs channel on Haystack Slack. I can see it becoming a good niche job recommendation system similar to how Andrej Karpathy's tool arxiv-sanity is for searching the Arxiv website.

Peter Dixon-Moses started the Flying Blind initiative around a shared Google spreadsheet that collects information from the community about good impact metrics, systemic embarrassing moments that could be addressed systemically, etc.

The next lightning talk was a plug for the JesterJ, a document ingestion software, by author Gus Heck. Gus points out that the advertised interfaces for document ingestion are usually for toy setups, and JesterJ provides a robust alternative to production style indexes.

Aruna Lakshmanan gave an awesome Lightning talk with tons of in-depth advice around search signals. I thought it would have been even better as a full size talk or workshop. Here are a list of user signals she spoke about.

  • classify  query term (brand/category/keyword, search vs landing, top product/category, keywords)
  • facets (click order, facets missed)
  • search vs features (don't load features up front) -- what are the top features that are being clicked?
  • click metrics -- not clicked results?
  • zero results and recommendations (should be based on user signals)
  • time per session (longer)
  • drop rate
  • personalization, preference and trending

Explainable recommendation systems with vector search, by Uri Goren, suggests creating mini-embeddings of fixed length for each feature and then concatenating for input matrix, and then densifying them by some means (auto-encoder, matrix factorization), then breaking them apart again into individual features. These features are now explainable since we know what they represent. These ideas have been implemented in Uri's recsplain system.

Lucene 9 vector implementation, by the folks at KMW Technology – Lucene and Solr 9.x support ANN search for vectors, but the index needs to be in a single segment and is loaded into memory in its entirety, making it not very useful for large vector indexes. Large indexes can be supported but at higher cost.

Eric Pugh floated a rating party to build an e-commerce dataset of query document pairs using the Quepid tool for search relevancy tuning.

Day 2

AI Powered Search Panel

Panel discussion / AMA composed of the authors of AI Powered Search – Trey Grainger, Doug Turnbull and Max Irwin – answer questions from the audience about the future of search, hybrid search, generative models, hype cycles, etc.

Citation Network

The Exploiting Citation Networks in Large Corpora to improve relevance on Broad Queries by Marc-Andre Morissette describes a technique to create synonyms using citation networks. Specifically, keywords in citing documents are treated as synonyms or child / meronym of the title of the cited document. Useful in legal situations where keywords in case law refers can be used colloquially to refer to specific legislation. Talk also outlines various statistical measures that tune the importance of such keywords.

Question Answering using Question Generation

I didn't technically attend this talk since this was my presentation, but I was there in the room when it happened, so I figured that counts. In any case, this was my talk, its about the work I did last year with fellow data scientist Sharvari Jadhav to build a FAQ style query pipeline proof of concept using a T5 sequence to sequence model to generate questions from passages, storing both passage and generated questions into the index, and matching incoming questions to stored questions during search, basically an implementation of the doc2query (and subsequently doctT5query) papers. Here are my slides for those interested.

Ref2Vec

Presented as part of Women of Search by Erika Cardenas, the presentation Women of Search present building Recommendation Systems with Vector Search discusses a concept called Ref2Vec to do product recommendations. This is currently a work in progress at Weaviate, and tries to represent a series of user interactions by the centroid of their embeddings in order to recommend them other products they might like.

Knowledge Graphs

The Populating and leveraging semantic knowledge graphs to supercharge search talk by Chris Morley covers a lot of ground around Knowledge Graphs and Semantic Search. I will revisit the presentation once his slides and video are out, but I think the point of the presentation was that he treats his tail queries as a sequence of Knowledge Graph entities and increase relevance.

ChatGPT dangers

The Stop Hallucinations and Half-Truths in Generative Search presentation by Colin Harman has some solid advice based on experience building GPT-3 based products over the last year. The talk basically provides a framework for building Generative AI based systems that are useful, helpful and relatively harmless. However, he stresses that it is not possible to guarantee 100% that such systems won't go off the rails, and to try to work around these limitations to the extent possible.

And thats my trip report. I did have situations where I really wanted to attend both simultaneous presentations, which I will try to address once the slides and videos are out. Hope you found it useful. If you work in search and search relevance and haven't signed up on the Relevancy Slack channel, I urge you to consider doing so -- there are a bunch of very knowledgeable and helpful people in there. And maybe we will see each other at the next Haystack!

2 comments (moderated to prevent spam):

Eric Pugh said...

Really great summary that will guide folks to figuring out which talks they want to watch first when this all is published ;-). Thanks for the write up!

Sujit Pal said...

Thank you Eric!