Sunday, October 12, 2025

Book Review: Time Series Forecasting using Foundation Models

As someone who primarily works in NLP and Search in the Health Domain, I don't have much use for Time Series. However, while exploring the Financial domain based on personal interest, I have been curious about Time Series for some time. Recently I attended the OpenHPI course Time Series Analysis taught by Mario Tormo Romero (even did the quizzes and the certificate of completion!). I was familiar with traditional techniques such as ARIMA (and its derivatives), but the course also covered Neural Network based techniques using CNN and RNN architectures, as well as some Transformer based models such as N-BEATS, Autoformer, Informer and TFT. Overall, I loved the course and learned a lot from it. If I had to complain, it would be to point to the lack of practical code examples and/or exercises, but I suppose it may not be that hard to Google (or now ChatGPT) that stuff on my own.

As I get older, I find I learn faster using what I know already to create analogies for what I am learning rather than starting from scratch. So it seemed to me that there is some similarity between predicting the next word in a sentence and predicting where a stock price will be headed next week given its previous history. Thus methods useful in NLP, including the relatively cutting edge methods around Transformers and Generative AI, could, at least in principle, be applicable for Time Series forecasting. Of course, NLP involves discrete entities, i.e words in a vocabulary, while Time Series involve continuous values, so there are bound to be differences as well.

So when I came across Marco Peixeiro's Time Series Forecasting using Foundation Models I was actually quite intrigued (sorry if I sound Victorian, but thats the closest word I can think of to indicate the mixture of vindication and curiosity I felt when I saw the title). Being a relative outsider to the world of Time Series forecasting, I felt vindicated that there is a research community that is actually looking at this connection, and was also curious to see where they had taken it. So I read the book and here is what I learned.

High level feedback -- overall, this book fulfils the promise it makes in its title, and then some. It covers 7 different Foundation Models (loosely speaking, some of these are more methodological framework than model) covering encoder-only, encoder-decoder and decoder-only (and even a couple of Mixture of Experts) models. In each of these model specific chapters, it provides code examples for using in zero-shot mode and fine-tuning where applicable. For models that produce point estimates, it demonstrates cross-validation based methods to produce a forecast distribution, as well as code for anomaly detection where applicable. Over the course of these seven chapters, it contrasts and compares these models with each other, so by the end of the book, the reader has a good grasp of what each model can or cannot do, and where they might shine. There is also a capstone project with a different dataset which serves to cement the reader's understanding of these various models. I think the material is not only comprehensive, but also prepares you to intelligently follow advances in the field of Time Series forecasting using Foundation Models, which is important given that it is still a relatively nascent and fast-growing field.

Detailed per chapter feedback -- the book is organized in three parts (four if you include the Capstone Project which is really a large exercise). Part 1 is mostly background, Part 2 covers 5 models specifically developed for Time Series forecasting, and Part 3 covers 2 models where the Time Series task is converted to a Language Task and a LLM used to handle it.

Part 1

  • Chapter 1: Understanding Foundation Models -- covers the Transformer architecture, with detailed coverage of its building blocks. Of note is the coverage of positional embeddings, which becomes even more crucial in the context of Time Series (an meaningless stream of numbers rather than a semi-meaningful stream of words). It also covers why (and why not) one would want to use Foundation Models for Time Series forecasting. --
  • Chapter 2: Building Foundation Models -- covers the N-BEATS model architecture. N-BEATS was also one of the models covered towards the end of the OpenHPI course, so this represents a sort of progression towards the use of FMs for Time Series forecasting. In addition, it covers different evaluation metrics used in this area, and the effect of forecasting horizons on performance.

Part 2

  • Chapter 3: Forecasting with TimeGPT -- covers the TimeGPT model, an encoder-decoder model that can predict future values in an univariate Time Series with exogenous variables. Code examples that illustrate how to use this model for zero-shot forecasting as well as fine-tuning, as well as cross-validation over different forecasting horizons and anomaly detection.
  • Chapter 4: Zero Shot Probabilistic Forecasting with Lag-LLaMA -- this is an open-source model built on top of the decoder-only LLaMA model from Meta. It supports univariate Time Series only, and is trained using lagged values of many different Time Series to create features. Lag-LLaMA provides probabilistic forecasts rather than point predictions. Code examples similar to the previous chapter are also provided.
  • Chapter 5: Learning the language of time with Chronos -- this chapter covers Chronos, a framework that allows using T5 and GPT-2 like language models with Time Series data. It describes various techniques like as mean scaling, mixup (convex combinations of multiple Time Series) and KernelSynth for data augmentation. The framework yields probabilistic forecasts as well, and median is usually used for point predictions if needed. As in previous chapters, code examples for zero-shot forecasting and fine-tuning, as well as cross-validation and anomaly detection are provided.
  • Chapter 6: Moirai a Universal forecasting Transformer -- Moirai is an encoder only model, provides probabilistic forecasts, and supports exogenous features out of the box. It uses a technique called patching to combine multiple consecutive inputs into a single element, similar to how one might use n-grams in NLP, which allows it to capture local semantic meaning and support longer context lengths. The output is sent through a linear projection layer. Moirai comes in two flavors, this one and Moirai-MoE, a mixture-of-experts version which is based on a decoder-only Transformer model.
  • Chapter 7: Deterministic Forecasting with TimesFM -- TimesFM produces determinisitic point predictions rather than a probabilistic forecast. It cannot be used for anomaly detection since we cannot construct confidence interfavals directly. One innovation with TimesFM is the use of residual blocks. The output is in the form of patches which goes through a linear layer to produce the final prediction. Exogenous variables are supported through the use of additional regression model. Unlike the other chapters, this does not cover fine-tuning since that requires JAX and was considered out of scope for the book (but maybe its a good reason to learn JAX?).

Part 3

  • Chapter 8: Forecasting as a Language task -- this chapter covers PromptCast, another technique that turns the Time Series forecasting task into a language task. The LLMs used here are Flan-T5 and LLaMA 2.3 3B-instruct. Essentialy it consists of creating prompts that specify an input sequence, optionally describing the task and asking the LLM to provide the next value. The chapter illustrates using zero-shot, few-shot and chain of thought prompting. The approach is likened to the Pudding mit Gabel festival, where people use forks to eat pudding.
  • Chapter 9: Reprogram an LLM for forecasting -- this chapter covers TimeLLM, another framework that reframes a Time Series forecasting task as a language task. It uses patches and reprogramming it by running it through a vocabulary, along with a prompt, as input, and a linear layer to produce the prediction from the learned embeddings. Training involves updating the weights of the patch reprogramming and linear layers. While it produces point predictions, it can be used for anomaly detection by using cross-validation to generate forecasts across multiple time horizons.

Part 4

  • Chapter 10: Capstone Project -- forecasting daily visits to a blog -- the chapter provides the dataset and asks to build models that predicts future daily visits. The provided solution starts with a SARIMA baseline, then uses the different models that the book discussed, to produce better and better predictions.

So there you have it. As I have mentioned earlier, I found this book quite useful, not only in its coverage of various models and how it is used for time series, but also as a primer to follow research progress in this field. Hopefully you found this review helpful and I hope this book will serve you as well as it has served me.

Saturday, September 20, 2025

Book Review: Statistics every Programmer Needs

I recently read Statistics every Programmer Needs by Gary Sutton. I am probably a good target audience for the book since I used to be a software developer that transitioned into data science some 10 years ago, then into machine learning with neural networks and transformers, and more recently, to Generative AI with Large Language Models. During this time, I have read numerous books on statistics in an effort to pick up what I didn’t know (being largely self-taught, there is plenty I didn't and still don't know). I think this book stands out not only as a thorough and practical introduction to statistics, but also provides coverage to areas one would normally consider peripheral to statistics but still useful in practical data science scenarios, such as Linear Programming, PERT/CPM, etc.

The book takes a very hands-on approach to each area, starting with business problems often faced by programmers, and outlines how statistical techniques (pertinent to that area) can be used to address these problems. It starts with foundational concepts but goes on to cover advanced concepts across statistics, machine learning, optimization, and project management. The book is organized into the following 14 chapters.

The Foundation (Chapter 1) begins by laying a solid groundwork. Readers are introduced to core statistical concepts, both descriptive (mean, mode, median) and inferential (confidence intervals, p-values), ensuring they grasp the basics before progressing. The inclusion of regression, optimization, simulation, and machine learning in the foundational chapter sets the tone for the book’s broad scope.

Probability and Counting Principles (Chapter 2) covers continuous and discrete variables and how they differ, permutations, combinations, and key probability functions (PDF, PMF, CDF). What was interesting for me is how permutations and combinations are described using basic probability concepts.

Probability Distributions (Chapter 3) covers the essential probability distributions—Gaussian, Binomial, Uniform, and Poisson. This chapter also covers conditional probability and Bayes’ rule with various practical applications.

Chapters 4 & 5 cover Linear and Logistic Regression respectively, Bonus material here (which I didn’t expect to see at least) were discussions around data normalization, residual analysis and multi-collinearity. Model evaluation is covered in depth for both varieties of models, as well as discussion of popular metrics used to evaluate these models.

Chapter 6 covers Decision Trees and Random Forests, the next major category of traditional ML models. The book has a solid introduction to decision trees and random forests, including how to interpret feature importance and use GINI impurity measures. I had hoped for some coverage of Gradient Boosted Trees since we were already discussing trees, but maybe that will come in the next edition.

Time Series Analysis (Chapter 7) is tackled with impressive depth, usually I would expect this subject to have its own book. However, the author does a good job of providing a good useful introduction to Time Series – covering forecasting, ARIMA models, exponential smoothing, stationarity testing (including the Augmented Dickey-Fuller test), trends, and seasonality. The chapter’s coverage of ACF/PACF plots and different exponential smoothing models (SES, DES, Holt-Winters) is thorough, making it a valuable reference for people working with temporal data and autoregressive models.

Chapter 8 covers Optimization using Linear Programming, an area I would expect to see covered in a book on Operations Research rather than Statistics. But the coverage is practical and complete, focusing on modeling business problems as optimization problems and solving them using Linear Programming libraries provided by scipy.optimize.

Chapter 9 covers Simulation using Monte Carlo techniques. As before, not something I would have expected in a Statistics book, but definitely a useful tool to have in one’s Data Science toolbox. As with the other chapters, multiple business scenarios are described and modeled with probability distributions, and Monte Carlo simulations performed on them to elicit useful insights.

Decision Methods and Markov Analysis (Chapters 10 & 11) cover Decision-making frameworks (maximax, maximin, minimax regret, expected value decision trees) and Markov analysis (transition probabilities, equilibrium, and absorbing states). Taken together, they could serve as a gateway for deeper explorations into Bayesian Networks and other Probabilistic Graphical Models.

The chapter on Benford’s Law (Chapter 12) for fraud detection is another unique touch, introducing readers to mantissa statistics. So is the chapter on Project Management (Chapter 13), which presents quantitative methods in project management (WBS, PERT, CPM, critical path)with actionable insights, bridging the gap between theory and project execution.

The concluding chapter on Statistical Quality Control (Chapter 14) is packed with practical content—control charts (p, np, c, g, etc.), UCL/LCL, and key metrics—making it invaluable for readers in manufacturing, operations, or quality assurance roles.

I thought that the book is ambitious in scope but succeeds in providing both breadth and depth, managing to hit all the high points without impacting the quality of each. As I mentioned earlier, its coverage goes beyond just statistics, making it a bargain since you get to learn useful statistics and quantitative techniques from a single book. I found both areas to be described in a very hands-on, example driven manner,often highlighting concepts and metrics that are overlooked in more traditional texts, thus making it a useful reference for software professionals (DS and non-DS alike).

Saturday, June 28, 2025

Book Review: Hands-On Artificial Intelligence for IoT

For those in similar professional circles as I am in, i.e. looking forward into the Generative AI space, yet with one foot pragmatically and firmly stuck in Machine Learning (ML) and Deep Learning (DL) techniques of the (recent, ok, not very distant) past, you will find Dr Amita Kapoor's recent book Hands-On Artificial Intelligence for IoT: Expert Machine Learning and Deep Learning Techniques for developing smarter IoT systems, 2/ed published by PackT a very useful resource into the use of these techniques applied to applications in the Internet of Things (IoT) domain. My own interest in IoT is driven primarily by previous personal (and failed) forays into Home Automation, but I do have some background in ML and DL techniques. So I approached this book from the perspective of a reader trying to understand the challenges and applications of these techniques in the IoT domain. This perspective shaped my reading of the book, and to some extent this review as well, as I looked for insights that would help me bridge my existing knowledge with the nuances of the IoT domain.

The book is organized into 4 parts. The first part introduces foundational techniques that are common to both the fields of AI (this term includes ML and DL) and IoT, while the second part covers advanced techniques. The third part focuses on specific IoT applications and AI techniques to handle them, while the fourth part covers IoT applications at different levels of granularity (personal/home, industrial, smart cities, etc.). The book is quite large (approximately 400 pages) and covers a lot of ground, some of which you may already be familiar with depending on your background. However, even in those cases, it may be worthwhile to skim the text to make sure you don't miss something you didn't know about, since things move quickly in this field. In any case, I present below my summary of each chapter, organized into a loose table of contents type structure. Hopefully they help you make the decision to read versus skim and optimize your reading experience.

  • Part I: Principles and Foundations of IoT and AI
    • Principles and Foundations of IoT and AI -- covers the theoretical foundations of IoT (think ISO network stack), various applications, and the necessity of using Big Data techniques and ML. It concludes with a list of tools used in the text, which includes Keras3.0 to support DL in IoT applications.
    • Data Access and Distributed Processing for IoT -- this chapter covers processing data in various formats (text, CSV, Excel, JSON, HDFS, and various SQL and NoSQL databases) using Python. This is because IoT devices often present data in proprietary formats, and you need to be able to read it into your application.
    • Machine Learning for IoT -- covers traditional ML algorithms such as Naive Bayes, Logistic Regression, Decision Trees, SVM, etc (remember my quip about having one foot firmly in the distant ML past? This is about as far back you would go), and one example using a simple DL model. Even though these may not be on par with more recent models such as BERT or small LLMs, these are typically deployed for solving simpler problems and have lower latency requirements, and are often adequate for the problem at hand.
  • Part II: Advanced AI Techniques and their application in IoT
    • Deep Learning for IoT -- introductory DL chapter, covers DL basics, CNN, RNN and AutoEncoders. It also provides a brief description of OpenVINO for IoT vision applications and TinyML for low-power on-device analytics, and using Keras Tuner for Hyperparameter Tuning.
    • Techniques for IoT -- explores alternative optimization techniques to Gradient Descent (GD) such as Simulated Annealing and Swarm Optimization. Also covers the use of Evolutionary and Genetic Algorithms (EA and GA) using libraries such as PyGAD and DEAP. While not mentioned explicitly, I will guess that EA/GA are included here because they are less resource intensive compared to GD, and can often be more efficient depending on application.
    • Reinforcement Learning for IoT -- this chapter covers the basics of Reinforcement Learning (RL), Q-Learning (DQN, DDQN, Policy Gradients, etc). As before RL based training can be particularly suitable for IoT applications because they are physics based and reinforcement signals can be cheaper to obtain and more relevant compared to supervision signals.
    • Generative Models for IoT -- this chapter covers Generative Adversarial Networks (GAN) and Variational AutoEncoders (VAE), which are probably not the Generative Models you had in mind if you are in the current "GenAI" space, but these are the OG models that generate images from noise (rather than the next token from a stream of tokens). Primrily their utility in the IoT space seems to be data generation and simulation (GAN) and anomaly detection (VAE).
  • Part III: Implementing Intelligent IoT Solutions in Diverse Domains
    • Distributed Learning using Keras -- this chapter covers Distributed training using Keras3 (using the JAX backend). This is useful information if you were just curious about Keras3 distributed capabilities. The relevance of this to the IoT space is that training data may be aggregated from multiple edge devices, say for recommendations, or multiple resource constrained edge devices may be used to retrain on new data, such as maintenance models in industrial IoT systems.
    • AI Cloud Platforms for IoT -- covers the need for Cloud based APIs in the context of IoT, and IoT adjacent services provided by popular providers such as AWS, Azure and Watson. Also covers these providers from the point of view of ML services, including Google VertexAI and AutoML, AWS SageMaker and Bedrock, and IoT specific services such as AWS IoT Core, Azure IoT Hub and GCP IoT code.
    • Deep Learning for Time Series Data from IoT -- covers working with time series data using traditional algorithms such as Prophet and Spark-ML, and wirth recurrent neural networks (RNN), and using pre-trained Temporal Convolutional Networks (TCN) models such as Chronos. This is particularly relevant since IoT devices emit streams of data over time that can be analyzed and extrapolated to predict the future.
    • Leveraging AI for Visual Data from IoT -- covers the processing of visual data from IoT systems, including image segmentation and object detection and classification. Architectures covered include CNN, TCN, and ViT (Visual Transformers).
    • AI for Text, Audio and Speech Data from IoT -- IoT devices can listen for particular sounds or speech patterns in their input, so this chapter covers mechanisms for IoT devices to process speech and audio, as well as free-form text input from users.
  • Part IV: Applying AI and IoT in Real-World Scenarios
    • AI for Personal and Home IoT -- mainly covers Personal and Home IoT applications, and considerations for creating them, along with a case study on a Smart Home implementation. It also includes pointers on getting started on your own IoT projects.
    • AI for IIoT -- there are already many IoT applications in use in industrial environments, and this chapter describes instances of these in various industries. Application areas are not only in manufacturing support, but could also be for preventative maintenance and forecasting load.
    • AI for Smart Cities IoT -- I felt initially that this may a bit of an aspirational chapter, in the sense that the typical reader of this book is unlikely to be in a position to influence the use of AI for smart cities, but the examples proved me wrong. Many of these are examples of smart solutions to everyday problems that are well within the realm of influence of people working for cities or local government, directly or indirectly.

In summary, I found this book to be a comprehensive resource to understand the concepts behind IoT applications. It's breadth of coverage is truly impressive -- spanning essential principles of IoT and AI, traversing through machine learning, deep learning, and optimization techniques, and culminating in thorough discussions on real-world deployments across domains such as smart homes, industrial IoT, and smart cities. While the book’s extensive coverage of fundamentals in areas like machine learning and distributed processing may at times feel broader than strictly necessary for readers already well-versed in these fields, it ensures that the material remains accessible to a broader spectrum of readers.

The progression of chapters from core principles to practical case studies equips readers with a strong theoretical foundation as well as a practical understanding of how intelligent systems can be implemented in the IoT space. The inclusion of dedicated chapters on time series analysis, computer vision (CV), and Natural Language and Audio processing, offer readers additional perspective in these areas. While I don't see an IoT applications in my immediate future, it was an interesting read, and having read it, I feel more confident about being able to tackle one should it come about.

Sunday, June 15, 2025

Book Review: Essential Graph RAG

Coming from a background of Knowledge Graph (KG) backed Medical Search, I don't need to be convinced about the importance of manually curated structured knowledge on the quality of search results. Traditional search is being rapidly replaced with Generative AI using a technique called Retrieval Augmented Generation (RAG), where the pipeline produces an answer summarizing the search results retrieved instead of the ten blue links that the searcher had to parse and retrieve an answer from earlier. In any case, I had been experimenting with Using KGs to enhance RAG to support this intuition, and when Microsoft announced their work on GtaphRAG, it felt good to be vindicated. So when Manning reached out to me to ask if I would be interested in reviewing the book Essential GraphRAG by Tomaž Bratanič and Oskar Hane, I jumped at the chance.

Both authors are from Neo4j, so it is not surprising that the search component is also Neo4j, even for vector search, and hybrid search is really vector + graph search (rather than the more common vector + lexical search). However, most people nowadays would prefer a multi-backend search that would include graph search as well as vector and lexical search, so the examples can help you learn (a) how to use Neo4j for vector search and (b) how to implement graph search with Neo4j. Since Neo4j is a leading graph database provider, this is useful information to know if you decide to incorporate graph search into your repertoire of tools, as you very likely are if you are reading this book.

The book is available under the Manning Early Access Program (MEAP) and is expected to be published in August 2025. It is currently organized into 8 chapters as follows:

Improving LLM accuracy -- here the authors introduce what LLMs are, what they are capable of as well as their limitations when used for question answering, i.e. not knowing about recent events post its training date, its tendency to hallucinate when it cannot answwe a question from the knowledge it was trained on, and its inability to know of company confidential or otherwise private information, since it is trained on public data only. They cover solutions to mitigate this, i.e. finetuning and RAG, and why RAG is a better alternaive in most cass. Finally they cover why KGs are the best general purpose datastore for RAG pipelines.

Vector Similarity Search and Hybrid Search -- here the authors cover the fundamentals of vector search, such as vector similarity functions, embedding models used to support vector search, and the reasoning behind chunking. They describe what a typical RAG pipeline looks like, although as mentioned earlier, they showcase Neo4j's vector search capabilities instead of relying on more popular vecror search alternatives. I thought it was good information though, since I wasn't aware that Neo4j supported vector search. They also cover hybrid search, in this case vector + graph search (this is a book about GraphRAG after all). Although I can definitely see Graph Search as one of the components of a hybrid search pipeline.

Advanced Vector Retrieval Strategies -- in this chapter, the authors introduce some interesting techniques to make your Graph Search produce more relevant context for your GraphRAG pipeline. Techniques on the query side include Step Back Prompting (SBP) to look for more generic concepts then drill down using Graph Search to improve recall, and the Parent Document Retriever pattern of retrieving parent documents of the chunks that matched, rather than the chunks themselves. On the indexing side, they talk about creating additional synthetic chunks that summarize actual chunks and can be queried as well as the chunks, and representing document chunks as pre-generated questions the chunk can answer instead of its text content.

Text2Cypher -- in this chapter, the authors show how an LLM can be prompted using Few Shot Learning (FSL) to generate Cypher queries from natural language. Users would type in a query using natural language, knowing nothing about the schema structure of the underlying Graph Database. The LLM, through detailed prompts and examples, would translate the natural language query to Cypher query. The authors also reference pre-trained models from Neo4j that have been fine-tuned to do this. While these models are generally not as effective as the one built from LLMs through prompting, they are more efficient on large volumes of data.

Agentic RAG -- Agentic RAG allows autonomous / semi-autonomous LLM backed software components, called Agents, to modify and enhance the standard control flow for RAG. One change could be for an Agent (the Router) to determine query intent and call on one or more retrieveers from the available pool of retrievers, or another (the Critic) to determine if the answer generated so far is adequate given the user's query, and if not, to rerun the pipeline with a modified query until the query is fully answered. The authors go on to describe a system (with code) consisting of a Router and Critic and several Retrieval Agents.

Constructing Knowledge Graph with LLM -- this chapter focuses on the index creation. Search is traditionally done on unstructured data such as text documents. This chapter describes using the LLM to extract entities of known types (PERSON, ORGANIZATION, LOCATION, etc), followed by a manual / semi-manual Graph Modeling step to set up relations between these extracted entities and build a schema. It then talks a little about convert specific query types into structured Cypher queries that leverage this schema.

Microsoft GraphRAG Implementation -- this chapter deals specifically with Microsoft's GraphRAG implementation. While most people think of GraphRAG as any infrastructure that supports incorporating Graph Search into a RAG pipeline, Microsoft specifies it as a multi-step recipe to build your KG from your data sources and use results from your KG to support a RAG pipeline. The steps involved are structured extraction and community detection, followed by summarization of community chunks into synthetic nodes. To some extent this is similar to Chonkie's Semantic Double Pass Merging (SDPM) chunker, except that the size of the skip window is unbounded. These synthetic chunks can be useful to answer global questions that span multiple ideas across the corpus. However, as the authors show, this approach can be effective for local queries as well.

RAG Application Evaluation -- because of the stochastic nature of LLMs, evaluating RAG pipelines in general present some unique challenges. Here these challenges are investigated with particular reference to GraphRAG systems, i.e. where the retrieval context is provided by Knowledge Graphs. The authors describe some metrics fro the RAGAS library, where LLMs are used to generate these metrics from outputs at different stages of the RAG pipeline. It also discusses ideas for setting up an evaluation dataset. The metrics covered in the example sare RAGAS context recall, faithfulness and answwr correctness.

Overall, the book takes a very practical, hands-on approach to the subject. It is filled with code examples and practical advice for leveraging KGs in RAG, and using Large Language Models (LLM) to build KGs, as well as evaluating such pipelines. If you were thinking of incorporating Graph Search into your search pipeline, be it traditional, hybrid, RAG or agentic, you will find the information in the book useful and beneficial.