Tuesday, May 22, 2018

Trip Report: 2018 NLP@UCSF Symposium


Last week, I attended 2018 NLP@UCSF, a half day event at University of California at San Francisco (UCSF) organized by the UCSF Clinical Data Community Organizing Team. The star of the show was their corpus of 58 million de-identified clinical notes from their own hospital system. Most of the talks were around work that was done with this dataset. My first major takeaway from this event is that you can often get very good results with simple models when your dataset is large and rich enough. My second takeaway is some advice from Beau Norgeot, one of the speakers, which I found particularly insightful and which I paraphrase below.

Visionary Keynote
The keynote was delivered by Atul Butte, MD, PhD, Priscilla Chan and Mark Zuckerberg Distinguished Professor, Director, UCSF Institute for Computational Health Sciences. He spoke about the dataset of 58 million de-identiyfied clinical notes from the UCSF Hospital system that they are planning to release, initially to other researchers within the UC system, and later to a broader audience. There are also plans of expanding this effort to include other hospitals in the UC system, potentially bringing up the size to 250 million. He also spoke about broadening the scope to not only text data, but also molecular and image data in the future. He cited possible applications of doing NLP on this dataset to areas such as radiology, oncology, medications, drug safety and longitudinality.

Using NLP to Identify Predictors of Mortality in ICU Text
This talk was given by Adams Dudley, MD, MBA, Professor of Medicine and Health Policy and Associate Director for Research, Philip R. Lee UCSF Institute for Health Policy Studies. Although his background is medicine and he has worked in Intensive Care Units (ICU) in the past, he also taught himself Natural Language Processing (NLP). He opened with the need for a mortality predictor in the ICU environment, and described how traditional (non-NLP) approaches to such a predictor have been built in the past by collecting selected patient vital signs (across multiple patients). However, these approaches ended up being too expensive, so they tried NLP on patient Electronic Health Records (EHR) collected at the ICU. He described multiple approaches to building features for the mortality predictor. The first was to annotate frequently occurring words, phrases and concepts from clinical notes that correlate strongly with mortality, then treat each document as a bag of concepts. The second approach was to combine them with tabular data available from electronic monitoring. The third was topic modeling the corpus and assign the topics human labels to make them more interpretable. The algorithm used for the model was logistic regression. The models have been tested across 3 UC hospitals and achieve an Area Under the Curve (AUC) score in the high 90% range. Overall a very impressive effort, demonstrating how even simple models can achieve good results on large and rich datasets.

Large Scale Analysis of Clinical Records at UCSF with state of the art Natural Language Processing Platforms
Maryam Panahiazar, PhD, Postdoctoral Scholar, Butte Lab, UCSF Institute for Computational Health Sciences, described a pipeline she built to extract knowledge from pathology reports. The objective was to create labels for mammogram images, indicating the presence and absence of breast cancer. She described the difficulty of detecting tumors in breast tissue from simple visual inspection because of variation in breast density across different people. Deep Learning is a possible solution, but needs lots of training data. This is where her work came in. Features were extracted from pathology reports by various means. Her in-house effort was to create a light-weight ontology of terms and then annotate the reports against the ontology, giving her a bag of features for each document. She also extracted features using fasttext embeddings. She also used third party ML tools such as IBM Watson, Google Cloud ML and Azure text analytics to extract additional features, which I thought was pretty cool, since it leverages their sophisticated general purpose English language models with no additional development cost.

Using Text Mining Methods to Detect a Clinical Infection
This talk was jointly given by Milena Gianfrancesco, PhD, MPH, Postdoctoral Scholar, UCSF and Suzanne Tamang, PhD, Assistant Faculty Director, Data Science, Stanford Center for Population Health Sciences, Instructor, Biomedical Data Science. Often, this disease is treated outside the clinical setting, and is often reported as a diagnosis. Their project was to detect infection of zoster/shingles from 800,000 EHRs from the UCSF hospital system. They used a system called CLinical EVent Recognition (CLEVER) developed by Suzanne to parse and featurize the text. Challenges associated with doing so included boundary detection issues, the high volume of synonyms and lexical variants in the text, a highly ambiguous task specific lexicon and the use of semantic modifiers. They constructed a terminology consisting of the Unified Medical Language System (UMLS) and used neural embedding models for statistically significant terms in the text. The end result is a (SVM based, if I remember correctly) predictor that can predict the infection with high accuracy.

Employing NLP to measure Patient Health Literacy and Clinician Linguistic Complexity: the UCSF/Kaiser ECLIPSSE Study
Dean Schillinger, MD, UCSF Professor of Medicine in Residence, Chief of the UCSF Division of General Internal Medicine at Zuckerberg San Francisco General Hospital, described a novel approach to measure gaps in health communication between patient and clinician. This is especially relevant in an age where patients and clinicians can communicate using email or chat. The talk describes work that has already been done on this, including 51 metrics, all of which require time consuming in-person involvement. The ECLIPSSE project developmed a novel measure of health literacy for the patient that are composed of more than 200 linguistic indexes. The project also measures various correlations, such as lower health literacy corresponds to a lower adherence to medical regimen. Similar work is planned in the future for measuring clinician's linguistic complexity as well.

An Absolute Beginner's Guide to NLP
This tutorial style talk was presented by Robert Thombley, Data Scientist, UCSF Institute for Health Policy Studies. As the talk title suggests, much of this was fairly basic stuff, and aimed at his UC colleagues who were looking to start out with NLP. He covered various Python libraries for doing NLP, including Natural Language ToolKit (NLTK) and SpaCy. He advocated building components that can be composed into more complex pipelines. He briefly mentioned MetaMap, a tool to recognize UMLS concepts in text, for Named Entity Recognition for medical text. He also mentioned various kinds of vectorization schemes such as Bag of Words, TF/IDF, and neural network embeddings with word2vec and fasttext. I came away from this talk with a pointer to textacy which appears to be a easy-to-use wrapper on top of SpaCy.

Tools and Approaches to NLP in Clinical Notes
This talk was presented by Madison Myers, Data Scientist, IBM, while she was an intern at UCSF. Her task was to extract UMLS entities from clinical notes. She initially tried Apache cTAKES but it proved to be too complex, so she evaluated a bunch of Python libraries that provide interfaces to various medical dictionaries such as SNOMED and UMLS, two of which I remember (full list in the slides) are PyMedTermino and py-umls. She also evaluated several negation detectors such as Negex/Context and NegFinder. She ended up using the UMLS installation tool MetamorphoSys to download the UMLS database and QuickUMLS Python library to parse UMLS concepts out of sentences (another library that I intend to look at).

cTakes - What does it take?
Gundolf Schenk, Sr. Biomedical Data Scientist, UCSF Institute for Computational Health Sciences, described the features of Apache cTakes and how to install it. His task was to output annotations from text that can then be searched for conditions, and cTakes qualified as one of the few tools that was open source and could do the job. It detects negation and uncertainty, location, temporal events, historical events, coreferences, and a host of other things. He also described the cTakes installation and how to run it with the PiperFileRunner. I remember installing cTakes as well in the past but the task appeared much harder. Either this has improved, or more likely I was too concerned with trying to adapt it to my then working environment (we ended up writing a cTakes clone that used our own tech for the central part, but with lot of overlapping functionality). Definitely worth a second look for me at some point.

Building Custom, Scalable and Generalizable NLP Tools
Beau Norgeot, Butte Lab, UCSF Institute for Computational Health Sciences, talked about his team's work to de-identify the 58 million clinical notes so they could be made available outside UCSF for other researchers. There were 139 different categories of clinical notes, each with different structure and vocabulary. The team initially tried Machine Learning (ML) based models since they typically did better on competitions. However, they found that ML based solutions did not generalize as well as rule-based systems, so they started looking at rule based approaches. The initial system for de-identification is rule based and is built using high level libraries, and is based on the idea of recognizing "safe" words rather than Protected Health Information (PHI). While it was generalizable across various types of clinical notes, it had a high false positive rate and a disturbing number of false negatives. The pipeline has been refined over multiple iterations, replacing the open source NER model with their own implementation trained using their data. Currently it is the largest fully annotated corpus of clinical notes. Beau also shared a bit of personal advice about reusing solutions that did well in general, and rethinking solutions that did not rather than try to fit it to your use case.

Infrastructure for NLP
Rick Larsen, UCSF Director, Research Informatics, Enterprise Information and Analytics, described the various types of computing infrastructure available for UCSF employees who are interested in getting into NLP and working on their de-identified dataset. Infrastructure ranged from small to medium desktops, to shared infrastructure on-premise, to cloud based infrastructure from Amazon. The common thread was that all the infrastructure was HIPAA compliant. Reminded me of the pain of handling HIPAA data with peicemeal HIPAA compliant infrastructure, the few times I have had to do it at my previous job.

Overall, I really enjoyed the talks. I got to meet all my old friends - NegEx, ConText, MetaMap, and all the others :-). I also came away with some great pointers to software that may be useful to me in the future and which I plan on checking out. I was hoping that they would share their dataset with the world, but so far it seems that they are only looking to do this within the UC system. There does seem to be some attempt at outreach beyond the UC system, both during the event and on nlp.ucsf.edu, so that may be something worth exploring if you are in the Bay Area and interested in NLP and/or working with this dataset.

I am relying on a combination of memory and my notes here, so apologies to the speakers if I missed or misrepresented anything. Hopefully the slides will be up soon on data.ucsf.edu.


Saturday, May 05, 2018

Singular Value Decomposition for Recommendations with Tensorflow


I recently completed (auditing) the Matrix Factorization and Advanced Techniques course on Coursera, conducted by the same people from the University of Minnesota who gave us the Lenskit project and an earlier awesome course on Recommender Systems on Coursera. Since I am auditing the course, Coursera no longer allows me to submit answers to quizzes and assignments. That is a good and a bad thing - while I can no longer check my understanding of the material by submitting the quizzes, it also frees me to do the assignment using any toolset that I choose.

One other interesting insight I got from the course was that Matrix Factorization could be formulated as a Gradient Descent problem. That is actually what led me to Tensorflow (TF) as my toolset of choice for this task. While TF is best known for building Deep Learning (DL) models, at its core it's a symbolic math library for high performance numerical computions that can be used to solve many kinds of numerical optimization problems that are common in Machine Learning (ML), with Gradient Descent being the central algorithm that is most (all?) of TF's built-in optimizers are based on.

The assignment was to investigate the Singular Value Decomposition (SVD) algorithm as a Matrix Factorization technique to do recommendations. The ratings matrix is de-biased using different techniques, and each de-biased matrix is decomposed an approximation reconstructed using various values of k. In this post, I describe how I did this using TF and Python. We start by setting up some imports and constants to refer to data files locations.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import tensorflow as tf

DATA_DIR = "../data"
MOVIE_FILE = os.path.join(DATA_DIR, "movies.csv")
RATINGS_FILE = os.path.join(DATA_DIR, "ratings.csv")
RESULTS_FILE = os.path.join(DATA_DIR, "eval_results.csv")
CONSOLE_FILE = os.path.join(DATA_DIR, "eval_console.txt")

The first step is extracting the data to a matrix format. We are supplied a file movies.csv which contains the movie_id and title, and a file ratings.csv which contains the user_id, movie_id and a 4-point rating. The movies.csv file is needed to report back recommendations to the user in terms of movie names. We use pandas to read the files into memory, and create few lookup tables that we will need later.

1
2
movies_df = pd.read_csv(MOVIE_FILE)
movies_df.head()


1
2
ratings_df = pd.read_csv(RATINGS_FILE)
ratings_df.head()


We then set up various lookup tables that we will need to translate ratings into matrix elements and recommendations from the matrix factorization back to human readable form.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
movies_id2name, movies_name2id = {}, {}
ids = movies_df["movieId"].values
titles = movies_df["title"].values
for id, title in zip(ids, titles):
    movies_id2name[id] = title
    movies_name2id[title] = id

movies_id2idx, movies_idx2id = {}, {}
for idx, movie_id in enumerate(ratings_df["movieId"].unique()):
    movies_id2idx[movie_id] = idx
    movies_idx2id[idx] = movie_id

users_id2idx, users_idx2id = {}, {}
for idx, user_id in enumerate(ratings_df["userId"].unique()):
    users_id2idx[user_id] = idx
    users_idx2id[idx] = user_id

num_users = len(users_id2idx)
num_movies = len(movies_id2idx)

Finally, we are ready to construct the matrix. We choose to use a dense matrix full of zeros to start with, where rows represent users and columns represent movies. We have 862 users and 2500 items, so our matrix R_val is of shape (862, 2500). We use the lookup tables we just generated to fill out the ratings for each (user, movie) pair that we have information for.

1
2
3
4
5
6
7
8
9
def construct_original_matrix(num_users, num_movies, ratings_df, 
                          users_id2idx, movies_id2idx):
    X = np.zeros((num_users, num_movies), dtype=np.float32)
    for user_id, movie_id, rating in ratings_df[["userId", "movieId", "rating"]].values:
        X[users_id2idx[user_id], movies_id2idx[movie_id]] = rating
    return X

R_val = construct_original_matrix(num_users, num_movies, ratings_df,
                                  users_id2idx, movies_id2idx)

The next step is to compute different kinds of bias and remove it from the matrix. This allows SVD to be more effective, as we will see in the results later. Bias can be split up into global bg, user bu and user-item bui, which represent the bias across all users, bias per user and bias per item per user.


Since this is a sparse matrix, we will treat all zero entries as unknown, and only consider non-zero entries for computing the bias. This is done using the code below. As you can see, this is essentially computing and removing average values of all non-zero entries along different axes.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def compute_bias(X, bias_type):
    Xc = X.copy()
    Xc[Xc == 0] = np.nan
    if bias_type == "global":
        return np.nanmean(Xc)
    elif bias_type == "user":
        return np.mean(np.nanmean(Xc, axis=0))
    elif bias_type == "item":
        return np.mean(np.nanmean(Xc, axis=1))
    else:
        raise Exception("invalid bias type")


def remove_bias(X, bias):
    Xc = X.copy()
    Xc[Xc == 0] = np.nan
    Xc = np.subtract(Xc, bias)
    Xc = np.nan_to_num(Xc)
    return Xc

        
bg = compute_bias(R_val, "global")
Rg_val = remove_bias(R_val, bg)

bu = compute_bias(Rg_val, "user")
Ru_val = remove_bias(Rg_val, bu)

bi = compute_bias(Rg_val, "item")
Ri_val = remove_bias(Rg_val, bi)

bui = compute_bias(Ru_val, "item")
Rui_val = remove_bias(Ru_val, bui)

Now that we have our different debiased input matrices, the next step is to decompose it into its constituents using SVD, and recompose it to an approximation using smaller dimensions. This will give us a matrix whose reduced columns correspond to user-tastes rather than actual movies, so we can make predictions of how a user will rank the movies they haven't seen already, or rank movies according to how the user would like them. We use TF's built-in svd() call here.


The graph will factorize the input matrix R using SVD into matrices U, S and V, recompose reduced versions of the three matrices to get an approximation R'. It then computes the reconstruction error (RMSE) between R and R', and also the proportion of explained variance R2 and returns it.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
def reconstruct_using_svd(X, k):
    
    if k == 0:
        return X, 1., 0.
    
    graph = tf.Graph()
    with graph.as_default():
        
        # input arg
        R = tf.placeholder(tf.float32, shape=(num_users, num_movies), name="R")
        
        # run SVD
        S, U, Vt = tf.svd(R, full_matrices=True)
        
        # reduce dimensions
        Sk = tf.diag(S)[0:k, 0:k]
        Uk = U[:, 0:k]
        Vk = tf.transpose(Vt)[0:k, :]

        # reconstruct matrix
        Rprime = tf.matmul(Uk, tf.matmul(Sk, Vk))

        
        # compute reconstruction RMSE
        rsquared = tf.linalg.norm(Rprime) / tf.linalg.norm(R)
        rmse = tf.metrics.root_mean_squared_error(R, Rprime)[1]
        
    with tf.Session(graph=graph) as sess:
        
        tf.global_variables_initializer().run()
        tf.local_variables_initializer().run()
        
        [Rprime_val, rsquared_val, rmse_val] = sess.run(
            [Rprime, rsquared, rmse], feed_dict={R: X})
        return Rprime_val, rsquared_val, rmse_val


R_rec, rsquared, rec_err = reconstruct_using_svd(R_val, 10)

print("reconstruction error (RMSE):", rec_err)
print("percentage of variance explained: {:.3f}".format(rsquared))
print("shape of reconstructed matrix: ", R_rec.shape)

We also create a method predict for predicting ratings that a user might give to a movie he hasn't seen yet, and a method recommend for recommending movies for a given user. Both methods take in a de-biased matrix and the bias to apply to the final results.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
def predict(user_id, movie_ids, X_rec, bias, 
            users_id2idx, movies_id2idx, movies_id2name):
    predictions = []
    for movie_id in sorted(movie_ids):
        user_idx = users_id2idx[user_id]
        movie_idx = movies_id2idx[movie_id]
        movie_name = movies_id2name[movie_id]
        prediction = bias + X_rec[user_idx, movie_idx]
        predictions.append((user_id, movie_id, movie_name, prediction))
    return predictions


def recommend(user_id, X_rec, top_n, bias, 
              users_id2idx, movies_idx2id, movies_id2name):
    user_idx = users_id2idx[user_id]
    rec_movies_idxs = np.argsort(-1 * X_rec[user_idx])[0:top_n]
    recommendations = []
    for movie_idx in rec_movies_idxs:
        movie_id = movies_idx2id[movie_idx]
        movie_name = movies_id2name[movie_id]
        pred_rating = bias + X_rec[user_idx, movie_idx]
        recommendations.append((user_id, movie_id, movie_name, pred_rating))
    return recommendations


R_rec, _, _ = reconstruct_using_svd(Rui_val, 50)
preds = predict(320, [260, 153, 527, 588], R_rec, bg + bu + bui, users_id2idx, 
                movies_id2idx, movies_id2name)
for pred in preds:
    print(pred)
print("---")
R_rec, _, _ = reconstruct_using_svd(Ri_val, 50)
recs = recommend(320, R_rec, 10, bg + bi, users_id2idx, movies_idx2id, movies_id2name)
for rec in recs:
    print(rec)

Here, we invoke these methods with parameters that are shown in the examples in the instructions. Unfortunately, my results don't match the example results provided. My suspicion is that there may be implementation differences between the Lenskit stack and the TF stack that are causing these differences, but I can't say for sure. If you notice something I am doing wrong, please let me know and I will fix it and report my findings again crediting you with the fix.

(320, 153, 'Batman Forever (1995)', 3.5883482)
(320, 260, 'Star Wars: Episode IV - A New Hope (1977)', 4.031598)
(320, 527, "Schindler's List (1993)", 3.769835)
(320, 588, 'Aladdin (1992)', 3.5995538)
---
(320, 2571, 'Matrix, The (1999)', 4.281472)
(320, 2959, 'Fight Club (1999)', 4.2387533)
(320, 4993, 'Lord of the Rings: The Fellowship of the Ring, The (2001)', 4.0973997)
(320, 296, 'Pulp Fiction (1994)', 4.089973)
(320, 1196, 'Star Wars: Episode V - The Empire Strikes Back (1980)', 4.0807734)
(320, 1136, 'Monty Python and the Holy Grail (1975)', 4.055152)
(320, 5952, 'Lord of the Rings: The Two Towers, The (2002)', 4.0527725)
(320, 7153, 'Lord of the Rings: The Return of the King, The (2003)', 4.0478134)
(320, 260, 'Star Wars: Episode IV - A New Hope (1977)', 4.031599)
(320, 4973, "Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001)", 4.020437)

Now that we have all our pieces in place, we construct our evaluation loop, where we re-run the matrix factorization using different bias models to de-bias the input ratings matrix R, and reconstruct approximations with different values of k, and report the reconstruction error (RMSE) and the proportion of explained variance (R2) for each bias-model and k combination. I also write out the output of the predict and recommend functions above for a specific user and a set of movies for each combination into an external file where they can be read from.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
def print_console(fconsole, X, k, bias_model, bias,
                  users_id2idx, movies_id2idx, movies_idx2id, 
                  movies_id2name):
    fconsole.write("#### predict -PuserId=320 -PitemIds=260,153,527,588 -PbiasModel={:s} -PfeatureCount={:d}\n"
                   .format(bias_model, k))
    predictions = predict(320, [260, 153, 527, 588], X, bias, 
                          users_id2idx, movies_id2idx, movies_id2name)
    for user_id, movie_id, movie_name, prediction in predictions:
        fconsole.write("\t{:d}\t{:d}\t{:s}\t{:.3f}\n"
            .format(user_id, movie_id, movie_name, prediction))
    fconsole.write("#### recommend -PuserId=320 -PbiasModel={:s} -PfeatureCount={:d}\n"
                   .format(bias_model, k))
    recommendations = recommend(320, X, 10, bias, users_id2idx, 
                                movies_idx2id, movies_id2name)
    for user_id, movie_id, movie_name, pred_rating in recommendations:
        fconsole.write("\t{:d}\t{:d}\t{:s}\t{:.3f}\n"
            .format(user_id, movie_id, movie_name, pred_rating))
    fconsole.write("\n")


kvalues = [0, 1, 5, 10, 15, 20, 25, 30, 40, 50]
bias_models = ["none", "global", "user", "item", "user-item"]

fresults = open(RESULTS_FILE, "w")
fconsole = open(CONSOLE_FILE, "w")
for k in kvalues:
    print("running evaluation for k={:d}".format(k))
    # reconstruct original matrix
    R_val = np.zeros((num_users, num_movies), dtype=np.float32)
    for user_id, movie_id, rating in ratings_df[["userId", "movieId", "rating"]].values:
        R_val[users_id2idx[user_id], movies_id2idx[movie_id]] = rating
    # no bias
    bias_model = bias_models[0]
    R_rec, rsquared, rmse = reconstruct_using_svd(R_val, k)
    print(".. k={:d}, bias-model={:s}, reconstruction: R^2={:.3f}, RMSE={:.3f}"
          .format(k, bias_model, rsquared, rmse))
    fresults.write("{:d}\t{:s}\t{:.7f}\t{:.7f}\n".format(k, bias_model, 
                                                         rsquared, rmse))
    print_console(fconsole, R_rec, k, bias_model, 0, users_id2idx, 
                  movies_id2idx, movies_idx2id, movies_id2name)
    # global bias
    bias_model = bias_models[1]
    bg = compute_bias(R_val, bias_model)
    Rg_val = remove_bias(R_val, bg)
    R_rec, rsquared, rmse = reconstruct_using_svd(Rg_val, k)
    print(".. k={:d}, bias-model={:s}, reconstruction: R^2={:.3f}, RMSE={:.3f}"
          .format(k, bias_model, rsquared, rmse))
    fresults.write("{:d}\t{:s}\t{:.7f}\t{:.7f}\n".format(k, bias_model, 
                                                         rsquared, rmse))
    print_console(fconsole, R_rec, k, bias_model, bg, users_id2idx, 
                  movies_id2idx, movies_idx2id, movies_id2name)
    # global + user bias
    bias_model = bias_models[2]
    bu = compute_bias(Rg_val, bias_model)
    Ru_val = remove_bias(Rg_val, bu)
    R_rec, rsquared, rmse = reconstruct_using_svd(Ru_val, k)
    print(".. k={:d}, bias-model={:s}, reconstruction: R^2={:.3f}, RMSE={:.3f}"
          .format(k, bias_model, rsquared, rmse))
    fresults.write("{:d}\t{:s}\t{:.7f}\t{:.7f}\n".format(k, bias_model, 
                                                         rsquared, rmse))
    print_console(fconsole, R_rec, k, bias_model, bg + bu, users_id2idx, 
                  movies_id2idx, movies_idx2id, movies_id2name)
    # global + item bias
    bias_model = bias_models[3]
    bi = compute_bias(Rg_val, bias_model)
    Ri_val = remove_bias(Rg_val, bi)
    R_rec, rsquared, rmse = reconstruct_using_svd(Ri_val, k)
    print(".. k={:d}, bias-model={:s}, reconstruction: R^2={:.3f}, RMSE={:.3f}"
          .format(k, bias_model, rsquared, rmse))
    fresults.write("{:d}\t{:s}\t{:.7f}\t{:.7f}\n".format(k, bias_model, 
                                                         rsquared, rmse))
    print_console(fconsole, R_rec, k, bias_model, bg + bi, users_id2idx, 
                  movies_id2idx, movies_idx2id, movies_id2name)
    # global + user bias + item bias (user-item)
    bias_model = bias_models[4]
    bui = compute_bias(Ru_val, "item")
    Rui_val = remove_bias(Ru_val, bui)
    R_rec, rsquared, rmse = reconstruct_using_svd(Rui_val, k)
    print(".. k={:d}, bias-model={:s}, reconstruction: R^2={:.3f}, RMSE={:.3f}"
          .format(k, bias_model, rsquared, rmse))
    fresults.write("{:d}\t{:s}\t{:.7f}\t{:.7f}\n".format(k, bias_model, 
                                                         rsquared, rmse))
    print_console(fconsole, R_rec, k, bias_model, bg + bu + bui, users_id2idx, 
                  movies_id2idx, movies_idx2id, movies_id2name)

fresults.close()
fconsole.close()

As can be seen on the chart on the left below, the error (RMSE) between the original and reconstructed for the raw matrix is much higher than the de-biased models. On the chart on the right, we have removed the raw matrix in order to see the RMSE values for the different bias models more clearly. As you can see, the performance of the different bias models are really close, with the global bias model performing the best in this case.






Similarly, the R2 values are also plotted for different values of k and exhibit an increase as the rank k of the reconstructed matrix increases. The user bias model seems to do the best here at explaining the variance between the input and reconstructed matrix.


That's all I had for today. This was the first time I attempted to use TF for something other than Deep Learning. While it is not as comprehensive as standard ML libraries like scikit-learn, it does have functionality for k-means clustering, decision trees, CRF, etc, so definitely something to look at further in the future.