<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://nickgreenquist.github.io//blog/feed.xml" rel="self" type="application/atom+xml" /><link href="https://nickgreenquist.github.io//blog/" rel="alternate" type="text/html" /><updated>2025-05-24T13:56:22+00:00</updated><id>https://nickgreenquist.github.io//blog/feed.xml</id><title type="html">Nick Greenquist</title><subtitle>Welcome to my blog! I am a Software Engineer at Google and am lucky enough to work on Machine Learning problems there.  I&apos;ve always been interested in &apos;creating&apos; more than &apos;consuming&apos; so this blog will serve as my attempt to do just that. </subtitle><entry><title type="html">Building a Two-Tower Deep Learning Movie Recommender System in Pytorch (from scratch)</title><link href="https://nickgreenquist.github.io//blog/projects/2024/02/04/two-tower-deep-learning-movie-recommender-system.html" rel="alternate" type="text/html" title="Building a Two-Tower Deep Learning Movie Recommender System in Pytorch (from scratch)" /><published>2024-02-04T12:05:14+00:00</published><updated>2024-02-04T12:05:14+00:00</updated><id>https://nickgreenquist.github.io//blog/projects/2024/02/04/two-tower-deep-learning-movie-recommender-system</id><content type="html" xml:base="https://nickgreenquist.github.io//blog/projects/2024/02/04/two-tower-deep-learning-movie-recommender-system.html"><![CDATA[<style type="text/css">
    .center-image
    {
        margin: 0 auto;
        display: block;
    }
</style>

<p><img src="https://nickgreenquist.github.io//blog/assets/MovieLens/main.jpg" alt="Main" width="800px" class="center-image" /></p>

<p><a id="introduction"></a></p>
<h2 id="introduction">Introduction</h2>
<p>Recommender systems play a crucial role in content-serving websites such as TikTok, Amazon, Netflix, YouTube, etc, by effectively showing users relevant content. They also make these companies billions of dollars: <a href="https://www.linkedin.com/pulse/how-amazon-generate-35-revenue-james-cooper/">35% of Amazon.com’s revenue is generated by its recommendation engine</a>.</p>

<p><a id="what-is"></a></p>
<h3 id="what-is-a-recommender-system">What is a recommender system?</h3>
<p>In my own words: a recommender system’s job is to match users to content. They score/rank which content to show based on some pre-determined metric to optimize (i.e. what content will the user find most relevant or engaging). Recommendation Systems exist because many platforms have thousands (or millions) of ‘things’ they could potentially show their users, and screen real estate is extremely valuable, so they help the platforms show only what they think the user will like. Without them, it would also be impossible for the user to sift through millions of options.</p>

<p><a id="use-cases"></a></p>
<h3 id="use-cases">Use Cases</h3>
<p>Recommender systems are used to decide:</p>
<ol>
  <li>What movie to watch next on Netflix</li>
  <li>What product to buy next on Amazon</li>
  <li>What song to listen to next on Spotify</li>
  <li>What video to watch next on YouTube</li>
  <li>What post to watch next on Instagram feed</li>
  <li>What vacation to book next on Booking.com</li>
</ol>

<p><a id="sneak-peek"></a></p>
<h3 id="sneak-peek-of-this-post">Sneak Peek of this Post</h3>
<p>Just to give an idea of what we are going to build here, it’s going to be a movie recommendation system that can give us recommendations for users like:</p>

<p><img src="https://nickgreenquist.github.io//blog/assets/MovieLens/horror_recs.png" alt="Horror Movie Recs" width="500px" class="center-image" /></p>

<blockquote>
  <p>TLDR: Please follow this <a href="https://github.com/nickgreenquist/recsys/blob/main/MovieLens_Two_Tower_Embedding_NN.ipynb">link</a> to go straight to the Colab notebook with the PyTorch code discussed in this post.</p>
</blockquote>

<h2 id="outline-of-this-post">Outline of this Post</h2>
<ul>
  <li><a href="#introduction">Introduction</a>
    <ul>
      <li><a href="#what-is">What is a recommender system?</a></li>
      <li><a href="#use-cases">Recommenation System Use Cases</a></li>
      <li><a href="#sneak-peek">Sneak Peek of this Post</a></li>
    </ul>
  </li>
  <li><a href="#why-this-post">Why this Post</a>
    <ul>
      <li><a href="#deep-rec">Deep Recommender Systems</a></li>
      <li><a href="#core-issue">The Core Issue</a></li>
      <li><a href="#goal">The Goal</a></li>
    </ul>
  </li>
  <li><a href="#building">Building a Movie Recommender</a>
    <ul>
      <li><a href="#dataset">Dataset</a></li>
      <li><a href="#data-processing">Data Preprocessing</a>
        <ul>
          <li><a href="#movie-preprocessing">Movie Feature Preprocessing</a></li>
          <li><a href="#movie-vocab">Movie Feature Vocab</a></li>
          <li><a href="#user-preprocessing">User Feature Preprocessing</a></li>
          <li><a href="#user-vocab">User Feature Vocab</a></li>
        </ul>
      </li>
      <li><a href="#training-examples">Generating Training Examples</a></li>
      <li><a href="#model">Designing our Model Architecture</a></li>
      <li><a href="#dataset-build">Building our Dataset</a></li>
      <li><a href="#model-build">Building our Model</a></li>
      <li><a href="#training">Training our Model</a></li>
    </ul>
  </li>
  <li><a href="#using-model">Actually Using our Model</a>
    <ul>
      <li><a href="#movie-embeddings">Precomputing Movie Embeddings</a></li>
      <li><a href="#similar-movies">Finding Most Similar Movies</a></li>
      <li><a href="#inference">Inference: Getting Recommendations</a></li>
      <li><a href="#examples">Example Recommendations</a>
        <ul>
          <li><a href="#anti-recs">Anti-Recommendations</a></li>
        </ul>
      </li>
    </ul>
  </li>
  <li><a href="#improvements">Possible Improvements</a></li>
  <li><a href="#appendix">Appendix</a>
    <ul>
      <li><a href="#2d">Visualizing Movies in 2D</a></li>
      <li><a href="#training-runs">Training Runs Losses</a></li>
      <li><a href="#other-domains">Applying Recommendations to Other Domains</a></li>
    </ul>
  </li>
</ul>

<p><a id="why-this-post"></a></p>
<h2 id="why-this-post">Why this Post</h2>
<p>There are already hundreds (probably thousands) of posts/tutorials/follow-alongs on ‘how to build a movie recommender system’. However, most of them are really really not useful. 90% of them just read in the dataset, perform cosine similarity on some vectors made out of the users and movies, show off some recommendations that make little sense, and call it a day. 10% will actually use Machine Learning, and of these, 90% will just do some variation of user-to-item matrix factorization and stop after they spit out some loss metrics (i.e. they won’t even show how it can be used).</p>

<p>Don’t get me wrong, MF is a great aproach to recommendation systems (I helped create a <a href="https://nickgreenquist.github.io/blog/projects/2018/06/08/building-books2rec.html">book recommendation system using it</a>, and even helped write an entire <a href="https://nickgreenquist.github.io/blog/projects/2019/01/02/cu2rec.html">CUDA library to parallelize it</a>), but it’s not ‘state of the art’ anymore. The main drawback of basic MF is that does not incorporate rich user/item features in how it learns to predict ratings/interactions.</p>

<p>For even more information on MF, please look at these blog posts for <a href="https://dorukkilitcioglu.github.io/2018/09/10/representation-learning-matrix-factorization.html">a technical explanation of Matrix Factorization</a> and <a href="https://dorukkilitcioglu.github.io/2018/05/14/introducing-books2rec.html">an application of Matrix Factorization for book recommendation</a>.</p>

<p><a id="deep-rec"></a></p>
<h3 id="deep-recommender-systems">Deep Recommender Systems</h3>
<p>The main approach I wanted to focus on is the ‘Two Tower’ Deep Learning Architecture. The idea is this: you want to recommend users to items (i.e. movies, products, songs, etc). You have features for users (what they have already watched/bought/clicked/etc, demographic information, etc.) and features for items (the genre of you movie/song, the artists/actors, the year, etc.). Can we shove all these features into a Neural Net and get good recommendations (hint: yes)?</p>

<p>There are some great tutorials on how to build modern day model architectures (including Two Tower models) to train a recommendation system model. Here is a list of a few of them:</p>
<ol>
  <li><a href="https://cloud.google.com/blog/products/ai-machine-learning/scaling-deep-retrieval-tensorflow-two-towers-architecture">Scaling deep retrieval with TensorFlow Recommenders and Vertex AI Matching Engine</a></li>
  <li><a href="https://developers.google.com/machine-learning/recommendation/labs/movie-rec-softmax-programming-exercise">Google: Build a Movie Recommendation System</a></li>
  <li><a href="https://www.tensorflow.org/recommenders/examples/deep_recommenders">TFRS: Building deep retrieval models</a></li>
  <li><a href="https://medium.com/coinmonks/how-to-implement-a-recommendation-system-with-deep-learning-and-pytorch-2d40476590f9">How to Implement a Recommendation System with Deep Learning and PyTorch</a></li>
</ol>

<p>BUT: they all have serious a common serious drawback: NONE show you how to ‘actually’ use these things. It’s all ‘left to the reader as an exercise.’</p>

<p><a id="core-issue"></a></p>
<h3 id="the-core-issue">The Core Issue</h3>
<p>All of the above examples (and others I could find) use the traditional approach of embedding the unique user ids (or hashes of something unique like a username) and movie ids (like an Amazon product id) to train the model. However, this means that you can ONLY perform inference for a user or item that has been trained. If you want to run inference on a user id that you did not train on, you won’t have an embedding for them and are out of luck. If a new user wanted to use this model, your only solutions are:</p>
<ol>
  <li>Retrain the entire model</li>
  <li>Try to partial fit your new user into the model with some training steps</li>
  <li>Find a user that is close to the new user and use their embedding</li>
</ol>

<p><a id="goal"></a></p>
<h3 id="the-goal">The Goal</h3>
<p>Instead, I wanted to set out to build a model that can generalize to any user, as long as you provide even a few examples of items they alreay enjoyed (i.e. when you sign up to Nick-flix Movie Streaming, you click on a few movies you like). The model should embed features of the user, not the unique user themselves. This is different than most other tutorials in that we will NOT map each user id to an embedding. Instead, for our simple example, we will treat each user as a feature vector made up of only two pieces of information:</p>
<ol>
  <li>Their Watch History: list of movies they have liked/disliked</li>
  <li>Their Genre Preferences: the average rating for each possible genre</li>
</ol>

<p>This way, after the model is trained, it can be used to get recommendations for any user as long as we have even a few movies they like (maybe some genres they prefer or don’t prefer).</p>

<p><a id="benefits"></a></p>
<h4 id="benefits">Benefits</h4>
<p>There are a few benefits to this approach:</p>
<ol>
  <li>Do not need to retrain model as often</li>
  <li>Model is more generalizable as it cannot just memorize labels for specific users</li>
  <li>User level cold start is much less of an issue</li>
</ol>

<h4 id="a-note-about-item-cold-start"><em>A Note about Item Cold Start</em></h4>
<p>Removing the user id from the model helps generalize to more users, and also reduces user cold start. But what about item cold start (new item, so no item id in the model)? To get around this, it’s also possible to remove the item id from the model input and only use item features as inputs.</p>

<p>This post does a great job explaining how having NO ids helps with cold start: <a href="https://medium.com/nvidia-merlin/solving-the-cold-start-problem-using-two-tower-neural-networks-for-nvidias-e-mail-recommender-2d5b30a071a4">Solving the Cold-Start Problem using Two-Tower Neural Networks for NVIDIA’s E-Mail Recommender Systems</a></p>

<p>As with all things, there are tradeoffs. By removing the item id, you will lose a lot of rich information about how users interact with unique items. However, you massively reduce cold start.</p>

<p>Some domains might be better or worse to keep or remove ids: Amazon has millions of products and probably thousands added each day that need to be recommended. They could benefit from removing unique item ids from their models. Netflix, on the other hand, might only have a few dozen movies added per month to their catalog. They might want to keep the movie id and retrain their model more frequently.</p>

<p>Another possibility is to train a normal model with id embeddings, and then a second model with just features. You can then mix and match the results of both.</p>

<p><a id="building"></a></p>
<h2 id="building-a-movie-recommender">Building a Movie Recommender</h2>
<p>Alright, let’s start building our recommendation system!</p>

<p><a id="dataset"></a></p>
<h3 id="the-dataset">The Dataset</h3>
<p>In order to build a Movie recommendation system, we are going to use the <a href="https://grouplens.org/datasets/movielens/">MovieLens Dataset</a> which is provided by <a href="https://grouplens.org/">GroupLens</a></p>

<p>In particular, we are going to use two datasets, one small and one large:</p>
<ol>
  <li>MovieLens Small - 100,000 ratings, 9,000 movies, 600 users.</li>
  <li>MovieLens Latest - 33,000,000 ratings, 86,000 movies, 330,975 users.</li>
</ol>

<p>The small dataset does not lead to good results, but it is better to use while building and testing our code.</p>

<p>We can read in this data as a Pandas Dataframe easily like this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df_ratings</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s">'ratings.csv'</span><span class="p">)</span>
<span class="n">df_movies</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s">'movies.csv'</span><span class="p">)</span>
</code></pre></div></div>

<p>The data will consists of two Pandas Dataframes we will read in:</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="https://nickgreenquist.github.io//blog/assets/MovieLens/df_ratings.png" alt="Ratings Table" width="300px" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Ratings Table</em></td>
    </tr>
  </tbody>
</table>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="https://nickgreenquist.github.io//blog/assets/MovieLens/df_movies.png" alt="Movies Table" width="600px" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Movies Table</em></td>
    </tr>
  </tbody>
</table>

<p><a id="data-processing"></a></p>
<h3 id="data-preprocessing">Data Preprocessing</h3>

<p>First, we need to clean the data as any ‘nan’ values can completely ruin our training (I spent many hours debugging my model, adding gradient clipping, batch norm, etc. Turns out there is a single nan value in MovieLens). We also convert movie ids to ints so they behave better as lookup keys.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># clean the ratings data
</span><span class="n">df_ratings</span> <span class="o">=</span> <span class="n">df_ratings</span><span class="p">.</span><span class="n">dropna</span><span class="p">()</span>
<span class="n">df_ratings</span><span class="p">[</span><span class="s">'movieId'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df_ratings</span><span class="p">[</span><span class="s">'movieId'</span><span class="p">].</span><span class="n">astype</span><span class="p">(</span><span class="nb">int</span><span class="p">,</span> <span class="n">copy</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
</code></pre></div></div>

<p>Next, let’s shrink down how many movies we care about.</p>

<blockquote>
  <p>NOTE: This is just for memory reasons. Google colab only gives you 12GB of RAM, so I can’t createa embeddings and feature vectors for all 50k+ movies.</p>
</blockquote>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># let's only work with movies with enough ratings.
</span><span class="n">min_ratings_per_movie</span> <span class="o">=</span> <span class="mi">1000</span>

<span class="c1"># get the number of ratings per movie
</span><span class="n">df_movies_to_num_ratings</span> <span class="o">=</span> <span class="n">df_ratings</span><span class="p">.</span><span class="n">groupby</span><span class="p">(</span><span class="s">'movieId'</span><span class="p">,</span> <span class="n">as_index</span><span class="o">=</span><span class="bp">False</span><span class="p">)[</span><span class="s">'rating'</span><span class="p">].</span><span class="n">count</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="s">"total movies in corpus: "</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">df_movies_to_num_ratings</span><span class="p">))</span>

<span class="n">df_movies_to_num_ratings</span> <span class="o">=</span> <span class="n">df_movies_to_num_ratings</span><span class="p">.</span><span class="n">sort_values</span><span class="p">(</span><span class="n">by</span><span class="o">=</span><span class="p">[</span><span class="s">'rating'</span><span class="p">],</span> <span class="n">ascending</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">df_movies_to_num_ratings</span> <span class="o">=</span> <span class="n">df_movies_to_num_ratings</span><span class="p">[</span><span class="n">df_movies_to_num_ratings</span><span class="p">[</span><span class="s">'rating'</span><span class="p">]</span> <span class="o">&gt;</span> <span class="n">min_ratings_per_movie</span><span class="p">]</span>
<span class="k">print</span><span class="p">(</span><span class="s">"movies with enough ratings: "</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">df_movies_to_num_ratings</span><span class="p">))</span>

<span class="c1"># get list of the top movies by number of ratings.
</span><span class="n">top_movies</span> <span class="o">=</span> <span class="n">df_movies_to_num_ratings</span><span class="p">.</span><span class="n">movieId</span><span class="p">.</span><span class="n">tolist</span><span class="p">()</span>

<span class="c1"># OUTPUT
# total movies in corpus:  58136
# movies with enough ratings:  2071
</span></code></pre></div></div>

<p><a id="movie-preprocessing"></a></p>
<h4 id="movie-feature-preprocessing">Movie Feature Preprocessing</h4>
<p>Let’s start processing some important info we need for our movies.</p>

<p>First, let’s create a simple map to hold how many ratings each movie has.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># keep a map of movieId to number of ratings.
</span><span class="n">movieId_to_num_ratings</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">movieId_list</span> <span class="o">=</span> <span class="n">df_movies_to_num_ratings</span><span class="p">.</span><span class="n">movieId</span><span class="p">.</span><span class="n">tolist</span><span class="p">()</span>
<span class="n">rating_list</span> <span class="o">=</span> <span class="n">df_movies_to_num_ratings</span><span class="p">.</span><span class="n">rating</span><span class="p">.</span><span class="n">tolist</span><span class="p">()</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">movieId_list</span><span class="p">)):</span>
  <span class="n">movieId_to_num_ratings</span><span class="p">[</span><span class="n">movieId_list</span><span class="p">[</span><span class="n">i</span><span class="p">]]</span> <span class="o">=</span> <span class="n">rating_list</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
</code></pre></div></div>

<p>Next, we reduce our Ratings Dataframe to get rid of any movies we don’t care about.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># reduce our df_ratings Dataframe to only rows that are top movie (to speed up later cells).
</span><span class="n">df_ratings_final</span> <span class="o">=</span> <span class="n">df_ratings</span><span class="p">[</span><span class="n">df_ratings</span><span class="p">.</span><span class="n">movieId</span><span class="p">.</span><span class="n">isin</span><span class="p">(</span><span class="n">top_movies</span><span class="p">)]</span>
</code></pre></div></div>

<p>Next, let’s make a helpful map from each movie id to it’s title.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># map movieId to title
</span><span class="n">movieId_to_title</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">title_to_movieId</span> <span class="o">=</span> <span class="p">{}</span>

<span class="n">movieId_list</span> <span class="o">=</span> <span class="n">df_movies</span><span class="p">.</span><span class="n">movieId</span><span class="p">.</span><span class="n">tolist</span><span class="p">()</span>
<span class="n">title_list</span> <span class="o">=</span> <span class="n">df_movies</span><span class="p">.</span><span class="n">title</span><span class="p">.</span><span class="n">tolist</span><span class="p">()</span>

<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">movieId_list</span><span class="p">)):</span>
  <span class="n">movieId</span> <span class="o">=</span> <span class="n">movieId_list</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
  <span class="n">title</span> <span class="o">=</span> <span class="n">title_list</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>

  <span class="n">movieId_to_title</span><span class="p">[</span><span class="n">movieId</span><span class="p">]</span> <span class="o">=</span> <span class="n">title</span>
  <span class="n">title_to_movieId</span><span class="p">[</span><span class="n">title</span><span class="p">]</span> <span class="o">=</span> <span class="n">movieId</span>
</code></pre></div></div>

<p>Let’s take a look at our top movies</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">movieId</span> <span class="ow">in</span> <span class="n">top_movies</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="mi">10</span><span class="p">]:</span>
  <span class="k">print</span><span class="p">(</span><span class="n">movieId</span><span class="p">,</span> <span class="n">movieId_to_title</span><span class="p">[</span><span class="n">movieId</span><span class="p">],</span> <span class="n">movieId_to_num_ratings</span><span class="p">[</span><span class="n">movieId</span><span class="p">])</span>
</code></pre></div></div>

<table>
  <thead>
    <tr>
      <th>movieId</th>
      <th>title</th>
      <th>num_ratings</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>318</td>
      <td>Shawshank Redemption, The (1994)</td>
      <td>36414</td>
    </tr>
    <tr>
      <td>356</td>
      <td>Forrest Gump (1994)</td>
      <td>33846</td>
    </tr>
    <tr>
      <td>296</td>
      <td>Pulp Fiction (1994)</td>
      <td>32440</td>
    </tr>
    <tr>
      <td>2571</td>
      <td>Matrix, The (1999)</td>
      <td>31830</td>
    </tr>
    <tr>
      <td>593</td>
      <td>Silence of the Lambs, The (1991)</td>
      <td>30452</td>
    </tr>
  </tbody>
</table>

<p>Next, let’s map each movie to a set() of its genres. These will be used as item features for our model (i.e. a movie will be represented as a vector of its genre information).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># map movieId to list of genres for that movie
</span><span class="n">genres</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
<span class="n">movieId_to_genres</span> <span class="o">=</span> <span class="p">{}</span>

<span class="n">movieId_list</span> <span class="o">=</span> <span class="n">df_movies</span><span class="p">.</span><span class="n">movieId</span><span class="p">.</span><span class="n">tolist</span><span class="p">()</span>
<span class="n">genre_list</span> <span class="o">=</span> <span class="n">df_movies</span><span class="p">.</span><span class="n">genres</span><span class="p">.</span><span class="n">tolist</span><span class="p">()</span>

<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">movieId_list</span><span class="p">)):</span>
  <span class="n">movieId</span> <span class="o">=</span> <span class="n">movieId_list</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
  <span class="k">if</span> <span class="n">movieId</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">top_movies</span><span class="p">:</span>
    <span class="k">continue</span>

  <span class="n">movieId_to_genres</span><span class="p">[</span><span class="n">movieId</span><span class="p">]</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>

  <span class="k">for</span> <span class="n">genre</span> <span class="ow">in</span> <span class="n">genre_list</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">split</span><span class="p">(</span><span class="s">'|'</span><span class="p">):</span>
    <span class="n">genres</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">genre</span><span class="p">)</span>
    <span class="n">movieId_to_genres</span><span class="p">[</span><span class="n">movieId</span><span class="p">].</span><span class="n">add</span><span class="p">(</span><span class="n">genre</span><span class="p">)</span>
</code></pre></div></div>

<p>Let’s print out an example of a movie’s genres:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="n">movieId_to_genres</span><span class="p">[</span><span class="n">title_to_movieId</span><span class="p">[</span><span class="s">'Matrix, The (1999)'</span><span class="p">]])</span>
<span class="c1"># OUTUPT: {'Action', 'Sci-Fi', 'Thriller'}
</span></code></pre></div></div>

<p>Next, let’s get the average rating of every movie. This is helpful later as we want to make sure our model isn’t just recommending the most popular movies.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># for every movie, get the avg rating
</span><span class="n">df_movies_to_avg_rating</span> <span class="o">=</span> <span class="n">df_ratings_final</span><span class="p">.</span><span class="n">groupby</span><span class="p">(</span><span class="s">'movieId'</span><span class="p">,</span> <span class="n">as_index</span><span class="o">=</span><span class="bp">False</span><span class="p">)[</span><span class="s">'rating'</span><span class="p">].</span><span class="n">mean</span><span class="p">()</span>

<span class="n">movieId_to_avg_rating</span> <span class="o">=</span> <span class="p">{}</span>

<span class="n">movieId_list</span> <span class="o">=</span> <span class="n">df_movies_to_avg_rating</span><span class="p">.</span><span class="n">movieId</span><span class="p">.</span><span class="n">tolist</span><span class="p">()</span>
<span class="n">rating_list</span> <span class="o">=</span> <span class="n">df_movies_to_avg_rating</span><span class="p">.</span><span class="n">rating</span><span class="p">.</span><span class="n">tolist</span><span class="p">()</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">movieId_list</span><span class="p">)):</span>
  <span class="n">movieId_to_avg_rating</span><span class="p">[</span><span class="n">movieId_list</span><span class="p">[</span><span class="n">i</span><span class="p">]]</span> <span class="o">=</span> <span class="n">rating_list</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
</code></pre></div></div>

<p><a id="movie-vocab"></a></p>
<h4 id="movie-feature-vocab">Movie Feature Vocab</h4>

<p>This is a pretty important part of the data prep. Setting up a feature vocab is needed to correctly map movie ids and movie features (i.e. only genres in our case) to the correct indices in input feature vectors.</p>

<p>Below, we map each unique movie id we have in top_movies to a unique index <em>i</em>. This will allow us to look up this movie’s embedding efficiently.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># build ITEM movieId embedding mapping
</span><span class="n">item_emb_movieId_to_i</span> <span class="o">=</span> <span class="p">{</span><span class="n">s</span><span class="p">:</span><span class="n">i</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span><span class="n">s</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">top_movies</span><span class="p">)}</span>
<span class="n">item_emb_i_to_movieId</span> <span class="o">=</span> <span class="p">{</span><span class="n">i</span><span class="p">:</span><span class="n">s</span> <span class="k">for</span> <span class="n">s</span><span class="p">,</span><span class="n">i</span> <span class="ow">in</span> <span class="n">item_emb_movieId_to_i</span><span class="p">.</span><span class="n">items</span><span class="p">()}</span>
</code></pre></div></div>

<p>Below, we map each unique genre to an index <em>i</em> that will be used to set each moveie’s genres in vector form. For example, if we had 3 genres, ‘Action’, ‘Horror’, and ‘Comedy’, we could map ‘Action’ to index 0, ‘Horror’ to index 1, and ‘Comedy’ to index 2. Therefore, and genre vector representation of movie that is an ‘Action Comedy’ movie would be [1, 0, 1].</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># build ITEM genre feature context
</span><span class="n">genre_to_i</span> <span class="o">=</span> <span class="p">{</span><span class="n">s</span><span class="p">:</span><span class="n">i</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span><span class="n">s</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">genres</span><span class="p">)}</span>
<span class="n">i_to_genre</span> <span class="o">=</span> <span class="p">{</span><span class="n">i</span><span class="p">:</span><span class="n">s</span> <span class="k">for</span> <span class="n">s</span><span class="p">,</span><span class="n">i</span> <span class="ow">in</span> <span class="n">genre_to_i</span><span class="p">.</span><span class="n">items</span><span class="p">()}</span>
</code></pre></div></div>

<p><a id="user-preprocessing"></a></p>
<h4 id="user-feature-preprocessing">User Feature Preprocessing</h4>

<p>Every user will have a feature context that will mostly be their watch history. Instead of using every movie in the corpus, we can use a smaller subset. This also helps with memory issues.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">num_movies_for_user_context</span> <span class="o">=</span> <span class="mi">250</span>
<span class="n">user_context_movies</span> <span class="o">=</span> <span class="n">top_movies</span><span class="p">[:</span><span class="n">num_movies_for_user_context</span><span class="p">]</span>
</code></pre></div></div>

<p>Next, let’s simplify our Ratings dataframe so we can much more efficiently iterate over it and create training examples.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># aggregate dataframe down into one row per user and list of their movies and ratings.
</span><span class="n">df_ratings_aggregated</span> <span class="o">=</span> <span class="n">df_ratings_final</span><span class="p">.</span><span class="n">groupby</span><span class="p">(</span><span class="s">'userId'</span><span class="p">).</span><span class="n">agg</span><span class="p">({</span><span class="s">'movieId'</span><span class="p">:</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="nb">list</span><span class="p">(</span><span class="n">x</span><span class="p">),</span> <span class="s">'rating'</span><span class="p">:</span> <span class="k">lambda</span> <span class="n">y</span><span class="p">:</span> <span class="nb">list</span><span class="p">(</span><span class="n">y</span><span class="p">)}).</span><span class="n">reset_index</span><span class="p">()</span>
</code></pre></div></div>

<p>The above code looks complicated, but actually all it is doing is finding all the rows where a userId equals a unique value, and collapsing all the values in the movieId and rating column into a list. We do this because dataframes are extremely inefficient to iterate over. We need to do this aggregation to get all the ratings for a user, so the more we can do inplace in the Dataframe, the better.</p>

<p>The Dataframe now looks like this (and has one row per user):</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="https://nickgreenquist.github.io//blog/assets/MovieLens/df_ratings_agg.png" alt="Ratings Table Aggregated" width="600px" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Aggregated Ratings Table</em></td>
    </tr>
  </tbody>
</table>

<p><a id="user-vocab"></a></p>
<h4 id="user-feature-vocab">User Feature Vocab</h4>
<p>Our user vocab will be slighly different than our movie vocab. This is because we actually will ignore the userId and not use it to look up some predefiend embedding. Instead, we will map every user to a feature vector that is made up of two parts</p>
<ol>
  <li>The movies the user has already watched</li>
  <li>The avg rating per genre for this user (think of this as the user’s preferences)</li>
</ol>

<p>This means it is entirely possible that two unique users have the same feature vector representation.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># build the USER context
</span><span class="n">user_context_size</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">user_context_movies</span><span class="p">)</span> <span class="o">+</span> <span class="nb">len</span><span class="p">(</span><span class="n">genres</span><span class="p">)</span>

<span class="n">user_context_movieId_to_i</span> <span class="o">=</span> <span class="p">{</span><span class="n">s</span><span class="p">:</span><span class="n">i</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span><span class="n">s</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">user_context_movies</span><span class="p">))}</span>
<span class="n">user_context_i_to_movieId</span> <span class="o">=</span> <span class="p">{</span><span class="n">i</span><span class="p">:</span><span class="n">s</span> <span class="k">for</span> <span class="n">s</span><span class="p">,</span><span class="n">i</span> <span class="ow">in</span> <span class="n">user_context_movieId_to_i</span><span class="p">.</span><span class="n">items</span><span class="p">()}</span>

<span class="n">user_context_genre_to_i</span> <span class="o">=</span> <span class="p">{</span><span class="n">s</span><span class="p">:</span><span class="n">i</span><span class="o">+</span><span class="nb">len</span><span class="p">(</span><span class="n">user_context_movies</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span><span class="n">s</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">genres</span><span class="p">))}</span>
<span class="n">user_context_i_to_genre</span> <span class="o">=</span> <span class="p">{</span><span class="n">i</span><span class="p">:</span><span class="n">s</span> <span class="k">for</span> <span class="n">s</span><span class="p">,</span><span class="n">i</span> <span class="ow">in</span> <span class="n">user_context_genre_to_i</span><span class="p">.</span><span class="n">items</span><span class="p">()}</span>
</code></pre></div></div>

<p>The full feature vector for a user will be one vector which is a concatanation of their watch history and their genre preferences. Let’s assume we have 3 movies and 3 genres.</p>
<ol>
  <li>Movies: Movie1, Movie2, Movie3</li>
  <li>Genres: Action, Horror, and Comedy</li>
</ol>

<p>For a user that liked Movie1 and disliked Movie3, and likes Action and Comedy but hates Horror, their feature vector would look like:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="s">'movie1'</span><span class="p">,</span> <span class="s">'movie2'</span><span class="p">,</span> <span class="s">'movie3'</span><span class="p">,</span> <span class="s">'action'</span><span class="p">,</span> <span class="s">'horror'</span><span class="p">,</span> <span class="s">'comedy'</span><span class="p">]</span>
<span class="p">[</span>     <span class="mf">1.0</span><span class="p">,</span>      <span class="mf">0.0</span><span class="p">,</span>     <span class="o">-</span><span class="mf">1.0</span><span class="p">,</span>      <span class="mf">1.0</span><span class="p">,</span>     <span class="o">-</span><span class="mf">1.0</span><span class="p">,</span>      <span class="mf">1.0</span><span class="p">]</span>
</code></pre></div></div>

<p><a id="training-examples"></a></p>
<h3 id="generating-training-examples">Generating Training Examples</h3>

<p>Next, we simulate real world training examples by masking out some of the user’s watched movies from their context, and using them as labels. We do not want the ‘movie to predict’ in their watch history, as we are trying to simulate the following: given the user’s other watched movies, what would they rate this new movie?</p>

<blockquote>
  <p>NOTE: this is not the same as a train/test split. This is just simulating how training examples would look like on a movie platform.</p>
</blockquote>

<blockquote>
  <p>WARNING: In the real world, as a user watches movies organically, you’d build up their watch history naturally and train a model using their older watch history to predict their most recent watches. If we wanted to be more correct, we could sort our ratings by timestamp and use older watches to predict newer watches, but it’s not necessary for the sake of this tutorial.</p>
</blockquote>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">percent_ratings_as_watch_history</span> <span class="o">=</span> <span class="mf">0.8</span>

<span class="n">user_to_movie_to_rating_WATCH_HISTORY</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">user_to_movie_to_rating_LABEL</span> <span class="o">=</span> <span class="p">{}</span>

<span class="c1"># loop over each column as this is much, much faster than going row by row.
</span><span class="n">user_list</span> <span class="o">=</span> <span class="n">df_ratings_aggregated</span><span class="p">[</span><span class="s">'userId'</span><span class="p">].</span><span class="n">tolist</span><span class="p">()</span>
<span class="n">movieId_list_list</span> <span class="o">=</span> <span class="n">df_ratings_aggregated</span><span class="p">[</span><span class="s">'movieId'</span><span class="p">].</span><span class="n">tolist</span><span class="p">()</span>
<span class="n">rating_list_list</span> <span class="o">=</span> <span class="n">df_ratings_aggregated</span><span class="p">[</span><span class="s">'rating'</span><span class="p">].</span><span class="n">tolist</span><span class="p">()</span>

<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">user_list</span><span class="p">)):</span>
  <span class="n">userId</span> <span class="o">=</span> <span class="n">user_list</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
  <span class="n">movieId_list</span> <span class="o">=</span> <span class="n">movieId_list_list</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
  <span class="n">rating_list</span> <span class="o">=</span> <span class="n">rating_list_list</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>

  <span class="n">num_rated_movies</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">movieId_list</span><span class="p">)</span>

  <span class="c1"># ignore users with too few ratings.
</span>  <span class="k">if</span> <span class="n">num_rated_movies</span> <span class="o">&lt;=</span> <span class="mi">5</span><span class="p">:</span> <span class="k">continue</span>

  <span class="c1"># set up training example maps.
</span>  <span class="n">user_to_movie_to_rating_WATCH_HISTORY</span><span class="p">[</span><span class="n">userId</span><span class="p">]</span> <span class="o">=</span> <span class="p">{}</span>
  <span class="n">user_to_movie_to_rating_LABEL</span><span class="p">[</span><span class="n">userId</span><span class="p">]</span> <span class="o">=</span> <span class="p">{}</span>

  <span class="c1"># shuffle the user's movies that they have watched
</span>  <span class="n">rated_movies</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">movieId_list</span><span class="p">,</span> <span class="n">rating_list</span><span class="p">))</span>
  <span class="n">random</span><span class="p">.</span><span class="n">shuffle</span><span class="p">(</span><span class="n">rated_movies</span><span class="p">)</span>

  <span class="c1"># put some movies into user's watch history (features) and leave others as labels to predict.
</span>  <span class="k">for</span> <span class="n">movieId</span><span class="p">,</span><span class="n">rating</span> <span class="ow">in</span> <span class="n">rated_movies</span><span class="p">[:</span><span class="nb">int</span><span class="p">(</span><span class="n">num_rated_movies</span> <span class="o">*</span> <span class="n">percent_ratings_as_watch_history</span><span class="p">)]:</span>
    <span class="n">user_to_movie_to_rating_WATCH_HISTORY</span><span class="p">[</span><span class="n">userId</span><span class="p">][</span><span class="n">movieId</span><span class="p">]</span> <span class="o">=</span> <span class="n">rating</span>
  <span class="k">for</span> <span class="n">movieId</span><span class="p">,</span><span class="n">rating</span> <span class="ow">in</span> <span class="n">rated_movies</span><span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">num_rated_movies</span> <span class="o">*</span> <span class="n">percent_ratings_as_watch_history</span><span class="p">):]:</span>
    <span class="n">user_to_movie_to_rating_LABEL</span><span class="p">[</span><span class="n">userId</span><span class="p">][</span><span class="n">movieId</span><span class="p">]</span> <span class="o">=</span> <span class="n">rating</span>
</code></pre></div></div>

<h4 id="set-up-feature-vectors">Set up Feature Vectors</h4>

<p>First, we need each user’s average rating. This is so we can de-bias each rating. If the user’s rating is above their average for a movie, we will treat that as a positive value in their vector. Opposite for ratings below their average. This helps the model learn likes and dislikes.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">user_to_avg_rating</span> <span class="o">=</span> <span class="p">{}</span>

<span class="c1"># NOTE: only use ratings from their synthetic watch history.
</span><span class="k">for</span> <span class="n">user</span> <span class="ow">in</span> <span class="n">user_to_movie_to_rating_WATCH_HISTORY</span><span class="p">.</span><span class="n">keys</span><span class="p">():</span>
  <span class="n">user_to_avg_rating</span><span class="p">[</span><span class="n">user</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span>
  <span class="k">for</span> <span class="n">movieId</span> <span class="ow">in</span> <span class="n">user_to_movie_to_rating_WATCH_HISTORY</span><span class="p">[</span><span class="n">user</span><span class="p">].</span><span class="n">keys</span><span class="p">():</span>
    <span class="n">user_to_avg_rating</span><span class="p">[</span><span class="n">user</span><span class="p">]</span> <span class="o">+=</span> <span class="n">user_to_movie_to_rating_WATCH_HISTORY</span><span class="p">[</span><span class="n">user</span><span class="p">][</span><span class="n">movieId</span><span class="p">]</span>

  <span class="n">user_to_avg_rating</span><span class="p">[</span><span class="n">user</span><span class="p">]</span> <span class="o">/=</span> <span class="nb">len</span><span class="p">(</span><span class="n">user_to_movie_to_rating_WATCH_HISTORY</span><span class="p">[</span><span class="n">user</span><span class="p">].</span><span class="n">keys</span><span class="p">())</span>
</code></pre></div></div>

<p>Next, let’s get each user’s preference for each genre. We will compute the user’s average rating for each genre and de-bias it.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># for every user, get the avg rating for every genre
</span><span class="n">user_to_genre_to_stat</span> <span class="o">=</span> <span class="p">{}</span>

<span class="c1"># NOTE: only use ratings from their synthetic watch history.
</span><span class="k">for</span> <span class="n">user</span> <span class="ow">in</span> <span class="n">user_to_movie_to_rating_WATCH_HISTORY</span><span class="p">.</span><span class="n">keys</span><span class="p">():</span>
  <span class="n">user_to_genre_to_stat</span><span class="p">[</span><span class="n">user</span><span class="p">]</span> <span class="o">=</span> <span class="p">{}</span>
  <span class="k">for</span> <span class="n">movieId</span> <span class="ow">in</span> <span class="n">user_to_movie_to_rating_WATCH_HISTORY</span><span class="p">[</span><span class="n">user</span><span class="p">].</span><span class="n">keys</span><span class="p">():</span>
    <span class="k">for</span> <span class="n">genre</span> <span class="ow">in</span> <span class="n">movieId_to_genres</span><span class="p">[</span><span class="n">movieId</span><span class="p">]:</span>
      <span class="k">if</span> <span class="n">genre</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">user_to_genre_to_stat</span><span class="p">[</span><span class="n">user</span><span class="p">]:</span>
        <span class="n">user_to_genre_to_stat</span><span class="p">[</span><span class="n">user</span><span class="p">][</span><span class="n">genre</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
            <span class="s">'NUM_RATINGS'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
            <span class="s">'SUM_RATINGS'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
        <span class="p">}</span>

      <span class="n">user_to_genre_to_stat</span><span class="p">[</span><span class="n">user</span><span class="p">][</span><span class="n">genre</span><span class="p">][</span><span class="s">'NUM_RATINGS'</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">1</span>
      <span class="n">user_to_genre_to_stat</span><span class="p">[</span><span class="n">user</span><span class="p">][</span><span class="n">genre</span><span class="p">][</span><span class="s">'SUM_RATINGS'</span><span class="p">]</span> <span class="o">+=</span> <span class="n">user_to_movie_to_rating_WATCH_HISTORY</span><span class="p">[</span><span class="n">user</span><span class="p">][</span><span class="n">movieId</span><span class="p">]</span>

<span class="k">for</span> <span class="n">user</span> <span class="ow">in</span> <span class="n">user_to_genre_to_stat</span><span class="p">.</span><span class="n">keys</span><span class="p">():</span>
  <span class="k">for</span> <span class="n">genre</span> <span class="ow">in</span> <span class="n">user_to_genre_to_stat</span><span class="p">[</span><span class="n">user</span><span class="p">].</span><span class="n">keys</span><span class="p">():</span>
    <span class="n">num_ratings</span> <span class="o">=</span> <span class="n">user_to_genre_to_stat</span><span class="p">[</span><span class="n">user</span><span class="p">][</span><span class="n">genre</span><span class="p">][</span><span class="s">'NUM_RATINGS'</span><span class="p">]</span>
    <span class="n">sum_ratings</span> <span class="o">=</span> <span class="n">user_to_genre_to_stat</span><span class="p">[</span><span class="n">user</span><span class="p">][</span><span class="n">genre</span><span class="p">][</span><span class="s">'SUM_RATINGS'</span><span class="p">]</span>
    <span class="n">user_to_genre_to_stat</span><span class="p">[</span><span class="n">user</span><span class="p">][</span><span class="n">genre</span><span class="p">][</span><span class="s">'AVG_RATING'</span><span class="p">]</span> <span class="o">=</span> <span class="n">sum_ratings</span> <span class="o">/</span> <span class="n">num_ratings</span>
</code></pre></div></div>

<p>Finaly, we can build a feature ‘context’ vector for every user using their watch history and genre preferences.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># for every user, create the training example user context vector
# 0:num_user_context_movies -&gt; user's watch history
# num_user_context_movies:num_user_context_movies+num_genres -&gt; user's genre affinity
</span><span class="n">user_to_context</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">user</span> <span class="ow">in</span> <span class="n">user_to_movie_to_rating_WATCH_HISTORY</span><span class="p">.</span><span class="n">keys</span><span class="p">():</span>
  <span class="n">context</span> <span class="o">=</span> <span class="p">[</span><span class="mf">0.0</span><span class="p">]</span> <span class="o">*</span> <span class="n">user_context_size</span>

  <span class="k">for</span> <span class="n">movieId</span> <span class="ow">in</span> <span class="n">user_to_movie_to_rating_WATCH_HISTORY</span><span class="p">[</span><span class="n">user</span><span class="p">].</span><span class="n">keys</span><span class="p">():</span>
    <span class="k">if</span> <span class="n">movieId</span> <span class="ow">in</span> <span class="n">user_context_movies</span><span class="p">:</span>
      <span class="c1"># note, we debias the rating so if the rating is under the user's avg rating,
</span>      <span class="c1"># it will hopefully count as negative strength for predicting similar movies.
</span>      <span class="c1"># vice-versa for a rating above the user's average.
</span>      <span class="n">context</span><span class="p">[</span><span class="n">user_context_movieId_to_i</span><span class="p">[</span><span class="n">movieId</span><span class="p">]]</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="n">user_to_movie_to_rating_WATCH_HISTORY</span><span class="p">[</span><span class="n">user</span><span class="p">][</span><span class="n">movieId</span><span class="p">]</span> <span class="o">-</span> <span class="n">user_to_avg_rating</span><span class="p">[</span><span class="n">user</span><span class="p">])</span>

  <span class="k">for</span> <span class="n">genre</span> <span class="ow">in</span> <span class="n">user_to_genre_to_stat</span><span class="p">[</span><span class="n">user</span><span class="p">].</span><span class="n">keys</span><span class="p">():</span>
    <span class="n">context</span><span class="p">[</span><span class="n">user_context_genre_to_i</span><span class="p">[</span><span class="n">genre</span><span class="p">]]</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="n">user_to_genre_to_stat</span><span class="p">[</span><span class="n">user</span><span class="p">][</span><span class="n">genre</span><span class="p">][</span><span class="s">'AVG_RATING'</span><span class="p">]</span> <span class="o">-</span> <span class="n">user_to_avg_rating</span><span class="p">[</span><span class="n">user</span><span class="p">])</span>

  <span class="n">user_to_context</span><span class="p">[</span><span class="n">user</span><span class="p">]</span> <span class="o">=</span> <span class="n">context</span>
</code></pre></div></div>

<p>We also need to set up the feature vector for each movie. This is much simpler since it’s just a binary mask vector for every genre the movie has.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># for every movie, create a training example feature context vector lookup
# it will contain the movie's genres.
</span><span class="n">movieId_to_context</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">movieId</span> <span class="ow">in</span> <span class="n">top_movies</span><span class="p">:</span>
  <span class="n">context</span> <span class="o">=</span> <span class="p">[</span><span class="mf">0.0</span><span class="p">]</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">genres</span><span class="p">)</span>

  <span class="k">for</span> <span class="n">genre</span> <span class="ow">in</span> <span class="n">movieId_to_genres</span><span class="p">[</span><span class="n">movieId</span><span class="p">]:</span>
    <span class="n">context</span><span class="p">[</span><span class="n">genre_to_i</span><span class="p">[</span><span class="n">genre</span><span class="p">]]</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="mf">1.0</span><span class="p">)</span>

  <span class="n">movieId_to_context</span><span class="p">[</span><span class="n">movieId</span><span class="p">]</span> <span class="o">=</span> <span class="n">context</span>
</code></pre></div></div>

<p><a id="model"></a></p>
<h3 id="designing-our-model-architecture">Designing our Model Architecture</h3>
<p>Before we build the final dataset, it would be helpful to know why it will be the way it is. Unlike most datasets you might have seen with just an X and Y matrix to hold inputs and labels, we are building a Two Tower model (technically it has 3 inputs).</p>

<p>Here is our model and I’ll explain it in detail.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="s">'''
user_features ---------------&gt; u_W1
                                    </span><span class="se">\
</span><span class="s">                                     </span><span class="se">\
</span><span class="s">                                      --&gt; dot_product(user, item) --&gt; prediction
                                     /
movie_features  -&gt; i_W1             /
                        \          /
                         --&gt; stack
                        /
movie_embedding -&gt; e_W1
'''</span>
</code></pre></div></div>

<p>Our model will have 3 inputs that each feed into a non-linear layer.</p>
<ol>
  <li>The user’s context feature vector</li>
  <li>The movie’s context feature vector</li>
  <li>The movie’s id embedding vector</li>
</ol>

<p>The final output is a prediction for the user’s rating for the movie. This prediction is based on the user’s feeatures, the movie’s features, and the learned embedding for the unique movie id.</p>

<p>To get the prediction, we concatanate the hidden embeddings from both movie inputs, and compute the dot product with the ‘combined movie embedding vector’ and the ‘user embedding vector’.</p>

<p>We will train this model by simply computing the loss of the actual user’s rating for this movie versus the predicted loss.</p>

<p>Backpropogation works normally even for this ‘Two Tower’ model.</p>

<p><a id="dataset-build"></a></p>
<h3 id="building-our-dataset">Building our Dataset</h3>

<p>Now that you understand the inputs and output of our model, let’s actually build the Dataset. It consists of 4 parts:</p>
<ol>
  <li>The user context feature vectors, held in matrix <em>X</em></li>
  <li>The target movie’s id, held in vector <em>target_movieId</em></li>
  <li>The target movie’s context feature vectors, held in matrix <em>target_movieId_context</em></li>
  <li>The target movie’s actual rating, held in vector <em>Y</em></li>
</ol>

<p>Each part of the Dataset will be converted to a Pytorch Tensor so it can be used in Pytorch funtions to feed and train the model.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Build the final Dataset
</span><span class="k">def</span> <span class="nf">build_dataset</span><span class="p">(</span><span class="n">users</span><span class="p">):</span>
  <span class="c1"># the user context (i.e. the watch hisotyr and genre affinities)
</span>  <span class="n">X</span> <span class="o">=</span> <span class="p">[]</span>

  <span class="c1"># the movieID for the movie we will predict rating for.
</span>  <span class="c1"># used to lookup the movie embedding to feed into the NN item tower.
</span>  <span class="n">target_movieId</span> <span class="o">=</span> <span class="p">[]</span>

  <span class="c1"># the feature context of the movie we will predict the rating for.
</span>  <span class="c1"># will also feed into it's own embedding and will be stacked with the embedding above.
</span>  <span class="n">target_movieId_context</span> <span class="o">=</span> <span class="p">[]</span>

  <span class="c1"># the predicted rating
</span>  <span class="n">Y</span> <span class="o">=</span> <span class="p">[]</span>

  <span class="c1"># create training examples, one for each movie the user has that we want as a label.
</span>  <span class="k">for</span> <span class="n">user</span> <span class="ow">in</span> <span class="n">users</span><span class="p">:</span>
    <span class="k">for</span> <span class="n">movieId</span> <span class="ow">in</span> <span class="n">user_to_movie_to_rating_LABEL</span><span class="p">[</span><span class="n">user</span><span class="p">].</span><span class="n">keys</span><span class="p">():</span>
      <span class="n">X</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">user_to_context</span><span class="p">[</span><span class="n">user</span><span class="p">])</span>

      <span class="n">target_movieId</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">item_emb_movieId_to_i</span><span class="p">[</span><span class="n">movieId</span><span class="p">])</span>

      <span class="n">target_movieId_context</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">movieId_to_context</span><span class="p">[</span><span class="n">movieId</span><span class="p">])</span>

      <span class="c1"># remember to debias the user rating so we can learn to predict if user
</span>      <span class="c1"># like/dislike a movie based on their features and the movie features.
</span>      <span class="n">Y</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="nb">float</span><span class="p">(</span><span class="n">user_to_movie_to_rating_LABEL</span><span class="p">[</span><span class="n">user</span><span class="p">][</span><span class="n">movieId</span><span class="p">]</span> <span class="o">-</span> <span class="n">user_to_avg_rating</span><span class="p">[</span><span class="n">user</span><span class="p">]))</span>

  <span class="n">X</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">tensor</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
  <span class="n">Y</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">tensor</span><span class="p">(</span><span class="n">Y</span><span class="p">)</span>
  <span class="n">target_movieId</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">tensor</span><span class="p">(</span><span class="n">target_movieId</span><span class="p">)</span>
  <span class="n">target_movieId_context</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">tensor</span><span class="p">(</span><span class="n">target_movieId_context</span><span class="p">)</span>

  <span class="k">return</span> <span class="n">X</span><span class="p">,</span><span class="n">Y</span><span class="p">,</span><span class="n">target_movieId</span><span class="p">,</span><span class="n">target_movieId_context</span>
</code></pre></div></div>

<h4 id="trainvalidation-split">Train/Validation Split</h4>
<p>Before we call the <em>build_dataset</em> function, let’s split up some users into Train and some users into Validation.</p>

<blockquote>
  <p>WARNING: It would be more correct to shuffle the Dataset and include some training examples into our training set and others into validation set. But for the simplicity of this example, I will simply use some users for training and some for validation.</p>
</blockquote>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># user users with enough ratings to predict to be useful for model learning.
</span><span class="n">final_users</span> <span class="o">=</span> <span class="p">[]</span>

<span class="k">for</span> <span class="n">user</span> <span class="ow">in</span> <span class="n">user_to_movie_to_rating_LABEL</span><span class="p">.</span><span class="n">keys</span><span class="p">():</span>
  <span class="n">num_ratings</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">user_to_movie_to_rating_LABEL</span><span class="p">[</span><span class="n">user</span><span class="p">])</span>

  <span class="k">if</span> <span class="n">num_ratings</span> <span class="o">&gt;=</span> <span class="mi">2</span> <span class="ow">and</span> <span class="n">num_ratings</span> <span class="o">&lt;</span> <span class="mi">500</span><span class="p">:</span>
    <span class="n">final_users</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">user</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># split users into train and validation users
</span><span class="n">percent_users_train</span> <span class="o">=</span> <span class="mf">0.8</span>

<span class="n">random</span><span class="p">.</span><span class="n">shuffle</span><span class="p">(</span><span class="n">final_users</span><span class="p">)</span>

<span class="n">train_users</span> <span class="o">=</span> <span class="n">final_users</span><span class="p">[:</span><span class="nb">int</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">final_users</span><span class="p">)</span> <span class="o">*</span> <span class="n">percent_users_train</span><span class="p">)]</span>
<span class="n">validation_users</span> <span class="o">=</span> <span class="n">final_users</span><span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">final_users</span><span class="p">)</span> <span class="o">*</span> <span class="n">percent_users_train</span><span class="p">):]</span>
</code></pre></div></div>

<p>Finally, let’s get our training and valadation Datasets.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">X_train</span><span class="p">,</span> <span class="n">Y_train</span><span class="p">,</span> <span class="n">target_movieId_train</span><span class="p">,</span> <span class="n">target_movieId_context_train</span> <span class="o">=</span> <span class="n">build_dataset</span><span class="p">(</span><span class="n">train_users</span><span class="p">)</span>
<span class="n">X_val</span><span class="p">,</span> <span class="n">Y_val</span><span class="p">,</span> <span class="n">target_movieId_val</span><span class="p">,</span> <span class="n">target_movieId_context_val</span> <span class="o">=</span> <span class="n">build_dataset</span><span class="p">(</span><span class="n">validation_users</span><span class="p">)</span>
</code></pre></div></div>

<p><a id="model-build"></a></p>
<h3 id="building-our-model">Building our Model</h3>

<p>Below, we will actually build from scratch the entire model (i.e. all the weights and biases). Notice the input dimensions of each of the parts. Each weight matrix links up to one of our inputs. <em>i_W1</em> will match the dimensions of the movie feature vector, which is the number of genres. <em>e_W1</em> is any size we want since we are creating an <em>ITEM_EMBEDDING_LOOKUP</em>. If this concept is confusing, please see Andrej Karpathy’s amazing series: <a href="https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ">Neural Networks: Zero to Hero</a>.</p>

<blockquote>
  <p>NOTE: using an embedding lookup table is no different than simply creating a one-hot encoded vector of size len(top_movies) and just multiplying it by some matrix. However, that way is extremely inefficient and I was unable to train any decently sized model due to RAM constraints.</p>
</blockquote>

<p>Few other small points:</p>
<ol>
  <li>I scale the weights down a little to prevent early training iterations having crazy loss. If this was an actual production model, I’d probably apply BatchNorm on it.</li>
  <li>I’m using MSELoss which means we will get the average ‘Squared Error’ loss on all examples. This just means we square the difference of the real rating vs the predicted rating.</li>
  <li>We set a batch size of 64.</li>
  <li>I create two lists to hold our loss for our full training set and validation set. We will plot this later.</li>
</ol>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">g</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">Generator</span><span class="p">().</span><span class="n">manual_seed</span><span class="p">(</span><span class="mi">42</span><span class="p">)</span>

<span class="c1"># ITEM movie feature tower
</span><span class="n">item_feature_embedding_size</span> <span class="o">=</span> <span class="mi">25</span>
<span class="n">i_W1</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">randn</span><span class="p">((</span><span class="nb">len</span><span class="p">(</span><span class="n">genres</span><span class="p">),</span> <span class="n">item_feature_embedding_size</span><span class="p">),</span> <span class="n">generator</span><span class="o">=</span><span class="n">g</span><span class="p">)</span>
<span class="n">i_b1</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">randn</span><span class="p">(</span><span class="n">item_feature_embedding_size</span><span class="p">,</span> <span class="n">generator</span><span class="o">=</span><span class="n">g</span><span class="p">)</span>

<span class="c1"># ITEM movie embedding tower
</span><span class="n">item_movieId_embedding_size</span> <span class="o">=</span> <span class="mi">25</span>
<span class="n">ITEM_EMBEDDING_LOOKUP</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">rand</span><span class="p">((</span><span class="nb">len</span><span class="p">(</span><span class="n">top_movies</span><span class="p">),</span> <span class="n">item_movieId_embedding_size</span><span class="p">),</span> <span class="n">generator</span><span class="o">=</span><span class="n">g</span><span class="p">)</span>
<span class="n">e_W1</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">randn</span><span class="p">((</span><span class="n">item_movieId_embedding_size</span><span class="p">,</span> <span class="n">item_movieId_embedding_size</span><span class="p">),</span> <span class="n">generator</span><span class="o">=</span><span class="n">g</span><span class="p">)</span>
<span class="n">e_b1</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">randn</span><span class="p">(</span><span class="n">item_movieId_embedding_size</span><span class="p">,</span> <span class="n">generator</span><span class="o">=</span><span class="n">g</span><span class="p">)</span>

<span class="c1"># USER feature tower
</span><span class="n">user_feature_embedding_size</span> <span class="o">=</span> <span class="mi">50</span> <span class="c1"># must be the concat dimension of both item embeddings.
</span><span class="n">u_W1</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">randn</span><span class="p">((</span><span class="n">user_context_size</span><span class="p">,</span> <span class="n">user_feature_embedding_size</span><span class="p">),</span> <span class="n">generator</span><span class="o">=</span><span class="n">g</span><span class="p">)</span>
<span class="n">u_b1</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">randn</span><span class="p">(</span><span class="n">user_feature_embedding_size</span><span class="p">,</span> <span class="n">generator</span><span class="o">=</span><span class="n">g</span><span class="p">)</span>

<span class="c1"># create a list of all our TRAINABLE params
</span><span class="n">parameters</span> <span class="o">=</span> <span class="p">[</span>
    <span class="n">i_W1</span><span class="p">,</span> <span class="n">i_b1</span><span class="p">,</span>
    <span class="n">ITEM_EMBEDDING_LOOKUP</span><span class="p">,</span> <span class="n">e_W1</span><span class="p">,</span> <span class="n">e_b1</span><span class="p">,</span>
    <span class="n">u_W1</span><span class="p">,</span> <span class="n">u_b1</span><span class="p">,</span>
<span class="p">]</span>

<span class="c1"># normalize the initial weight values.
</span><span class="n">weight_scale</span> <span class="o">=</span> <span class="mf">0.1</span>
<span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">parameters</span><span class="p">:</span>
  <span class="n">p</span> <span class="o">*=</span> <span class="n">weight_scale</span>

<span class="c1"># set all parameters to require gradients
</span><span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">parameters</span><span class="p">:</span>
  <span class="n">p</span><span class="p">.</span><span class="n">requires_grad</span> <span class="o">=</span> <span class="bp">True</span>

<span class="c1"># print number of trainable params in our NN
</span><span class="k">print</span><span class="p">(</span><span class="nb">sum</span><span class="p">(</span><span class="n">p</span><span class="p">.</span><span class="n">nelement</span><span class="p">()</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">parameters</span><span class="p">))</span>

<span class="c1"># set the loss function we want to use.
# we use MSE Loss because we are predicting the rating per label movie.
</span><span class="n">loss</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">MSELoss</span><span class="p">()</span>

<span class="c1"># set how big we want each minibatch to be
</span><span class="n">minibatch_size</span> <span class="o">=</span> <span class="mi">64</span>

<span class="c1"># create list to hold our loss per training iterations
</span><span class="n">loss_train</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">loss_val</span> <span class="o">=</span> <span class="p">[]</span>
</code></pre></div></div>

<p><a id="training"></a></p>
<h3 id="training-our-model">Training our Model</h3>

<p>Below is the actual code to train our model without the use of any Pytorch library. I write this all out manually versus using a Torch Module so we can control and study every single part and really understand each step.</p>

<p>Some notes:</p>
<ol>
  <li>Every 1000 iterations, we will compute our loss on the full validation set.</li>
  <li>If we are doing a validation run, we will not backprop (won’t train)</li>
  <li>If we are doing a full validation run, we will use our validation Dataset pieces.</li>
  <li><a href="https://pytorch.org/docs/stable/generated/torch.einsum.html">torch.einsum</a> is how we will do batched dot products of our user and movie embeddings to get the final prediction.</li>
  <li>We will gradually decrease our learning rate. We could use an optimizer, but I wanted to avoid Torch libraries.</li>
  <li>We record our avg loss during training and for each full validation runs.</li>
</ol>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">log_every</span> <span class="o">=</span> <span class="mi">1000</span>

<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">50_000</span><span class="p">):</span>

  <span class="c1"># every so often, let's train and compute loss on entire validation set
</span>  <span class="n">is_full_val_run</span> <span class="o">=</span> <span class="bp">False</span>
  <span class="k">if</span> <span class="n">i</span> <span class="o">%</span> <span class="n">log_every</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
    <span class="n">is_full_val_run</span> <span class="o">=</span> <span class="bp">True</span>

  <span class="c1"># select training example inputs we use for this run, and minibatch indices.
</span>  <span class="n">X</span> <span class="o">=</span> <span class="n">X_train</span>
  <span class="n">Y</span> <span class="o">=</span> <span class="n">Y_train</span>
  <span class="n">target_movieId_context</span> <span class="o">=</span> <span class="n">target_movieId_context_train</span>
  <span class="n">target_movieId</span> <span class="o">=</span> <span class="n">target_movieId_train</span>
  <span class="k">if</span> <span class="n">is_full_val_run</span><span class="p">:</span>
    <span class="n">X</span> <span class="o">=</span> <span class="n">X_val</span>
    <span class="n">Y</span> <span class="o">=</span> <span class="n">Y_val</span>
    <span class="n">target_movieId_context</span> <span class="o">=</span> <span class="n">target_movieId_context_val</span>
    <span class="n">target_movieId</span> <span class="o">=</span> <span class="n">target_movieId_val</span>

  <span class="c1"># construct a minibatch
</span>  <span class="n">ix</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="p">(</span><span class="n">minibatch_size</span><span class="p">,))</span>
  <span class="k">if</span> <span class="n">is_full_val_run</span><span class="p">:</span>
    <span class="n">ix</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="p">(</span><span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],))</span>

  <span class="c1"># ---------- FORWARD PASS ----------
</span>
  <span class="c1"># forward the USER tower.
</span>  <span class="n">user_contexts</span> <span class="o">=</span> <span class="n">X</span><span class="p">[</span><span class="n">ix</span><span class="p">]</span>
  <span class="n">user_embedding</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">tanh</span><span class="p">(</span><span class="n">user_contexts</span> <span class="o">@</span> <span class="n">u_W1</span> <span class="o">+</span> <span class="n">u_b1</span><span class="p">)</span>

  <span class="c1"># forward the ITEM movie feature tower
</span>  <span class="n">movie_contexts</span> <span class="o">=</span> <span class="n">target_movieId_context</span><span class="p">[</span><span class="n">ix</span><span class="p">]</span>
  <span class="n">item_feature_embedding</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">tanh</span><span class="p">(</span><span class="n">movie_contexts</span> <span class="o">@</span> <span class="n">i_W1</span> <span class="o">+</span> <span class="n">i_b1</span><span class="p">)</span>

  <span class="c1"># lookup the ITEM movieId embedding and pass through non-linear layer.
</span>  <span class="c1"># NOTE: this is just a shortcut to multiplying a one-hot vector with the masked movieID with a weight matrix.
</span>  <span class="n">item_embedding_hidden</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">tanh</span><span class="p">(</span><span class="n">ITEM_EMBEDDING_LOOKUP</span><span class="p">[</span><span class="n">target_movieId</span><span class="p">[</span><span class="n">ix</span><span class="p">]]</span> <span class="o">@</span> <span class="n">e_W1</span> <span class="o">+</span> <span class="n">e_b1</span><span class="p">)</span>

  <span class="c1"># concat/stack the two ITEM embeddings together
</span>  <span class="n">item_embedding_combined</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">cat</span><span class="p">((</span><span class="n">item_feature_embedding</span><span class="p">.</span><span class="n">view</span><span class="p">(</span><span class="n">item_feature_embedding</span><span class="p">.</span><span class="n">size</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span> <span class="o">-</span><span class="mi">1</span><span class="p">),</span>
                                       <span class="n">item_embedding_hidden</span><span class="p">.</span><span class="n">view</span><span class="p">(</span><span class="n">item_embedding_hidden</span><span class="p">.</span><span class="n">size</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span> <span class="o">-</span><span class="mi">1</span><span class="p">)),</span> <span class="n">dim</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>

  <span class="c1"># the final prediction is the dot product of the user embedding and the combined item embedding.
</span>  <span class="c1"># NOTE: because we have a batch of these, we will use torch.einsum to do this efficiently.
</span>  <span class="n">preds</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">einsum</span><span class="p">(</span><span class="s">'ij, ij -&gt; i'</span><span class="p">,</span> <span class="n">user_embedding</span><span class="p">,</span> <span class="n">item_embedding_combined</span><span class="p">)</span>

  <span class="c1"># compute the loss of our predicted ratings
</span>  <span class="n">output</span> <span class="o">=</span> <span class="n">loss</span><span class="p">(</span><span class="n">preds</span><span class="p">,</span> <span class="n">Y</span><span class="p">[</span><span class="n">ix</span><span class="p">])</span>

  <span class="c1"># backpropogation and update weights (except on validation runs)
</span>  <span class="k">if</span> <span class="ow">not</span> <span class="n">is_full_val_run</span><span class="p">:</span>
    <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">parameters</span><span class="p">:</span>
      <span class="n">p</span><span class="p">.</span><span class="n">grad</span> <span class="o">=</span> <span class="bp">None</span>

    <span class="n">output</span><span class="p">.</span><span class="n">backward</span><span class="p">()</span>

    <span class="c1"># update weights using gradients * learning_rate
</span>    <span class="n">lr</span> <span class="o">=</span> <span class="mf">0.1</span>
    <span class="k">if</span> <span class="n">i</span> <span class="o">&gt;=</span> <span class="mi">10_000</span><span class="p">:</span> <span class="n">lr</span> <span class="o">=</span> <span class="mf">0.05</span>
    <span class="k">if</span> <span class="n">i</span> <span class="o">&gt;=</span> <span class="mi">20_000</span><span class="p">:</span> <span class="n">lr</span> <span class="o">=</span> <span class="mf">0.01</span>
    <span class="k">if</span> <span class="n">i</span> <span class="o">&gt;=</span> <span class="mi">30_000</span><span class="p">:</span> <span class="n">lr</span> <span class="o">=</span> <span class="mf">0.005</span>
    <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">parameters</span><span class="p">:</span>
      <span class="n">p</span><span class="p">.</span><span class="n">data</span> <span class="o">+=</span> <span class="p">(</span><span class="n">lr</span> <span class="o">*</span> <span class="o">-</span><span class="n">p</span><span class="p">.</span><span class="n">grad</span><span class="p">)</span>

  <span class="c1"># every so often, log the MSE loss on full val set (see above)
</span>  <span class="k">if</span> <span class="n">is_full_val_run</span><span class="p">:</span>
    <span class="n">loss_val</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">output</span><span class="p">.</span><span class="n">item</span><span class="p">())</span>

    <span class="k">if</span> <span class="n">i</span> <span class="o">&gt;=</span> <span class="n">log_every</span><span class="p">:</span>
      <span class="n">avg_train_loss_last_batches</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">loss_train</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="n">log_every</span><span class="p">:</span><span class="n">i</span><span class="p">])</span>
    <span class="k">else</span><span class="p">:</span>
      <span class="n">avg_train_loss_last_batches</span> <span class="o">=</span> <span class="n">output</span><span class="p">.</span><span class="n">item</span><span class="p">()</span>
    <span class="k">print</span><span class="p">(</span><span class="s">"[TRAIN] i: "</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="s">" | "</span><span class="p">,</span> <span class="s">"loss: "</span><span class="p">,</span> <span class="n">avg_train_loss_last_batches</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="s">"[VAL] i: "</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="s">" | "</span><span class="p">,</span> <span class="s">"loss: "</span><span class="p">,</span> <span class="n">output</span><span class="p">.</span><span class="n">item</span><span class="p">())</span>
    <span class="k">print</span><span class="p">()</span>
  <span class="k">else</span><span class="p">:</span>
    <span class="n">loss_train</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">output</span><span class="p">.</span><span class="n">item</span><span class="p">())</span>
</code></pre></div></div>

<p>As the model trains, we should see something liks this being printed:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[TRAIN] i:  0  |  loss:  0.9906623363494873
[VAL] i:  0  |  loss:  1.0060031414031982

[TRAIN] i:  1000  |  loss:  0.8815735578536987
[VAL] i:  1000  |  loss:  0.896892786026001

[TRAIN] i:  2000  |  loss:  0.8524652123451233
[VAL] i:  2000  |  loss:  0.8721503019332886
</code></pre></div></div>

<p>Finally, we can plot our training and validation losses versus each iteration.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">loss_train_bucket_means</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">loss_train</span><span class="p">),</span> <span class="n">log_every</span><span class="p">):</span>
  <span class="n">loss_train_bucket_means</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">loss_train</span><span class="p">[</span><span class="n">i</span><span class="p">:</span><span class="n">i</span><span class="o">+</span><span class="n">log_every</span><span class="p">]))</span>

<span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">([</span><span class="n">i</span><span class="o">*</span><span class="mi">1000</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">loss_train_bucket_means</span><span class="p">))],</span> <span class="n">loss_train_bucket_means</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">([</span><span class="n">i</span><span class="o">*</span><span class="mi">1000</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">loss_val</span><span class="p">))],</span> <span class="n">loss_val</span><span class="p">[</span><span class="mi">1</span><span class="p">:])</span>
</code></pre></div></div>

<p>It will look something like this:</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="https://nickgreenquist.github.io//blog/assets/MovieLens/medium-small-loss.png" alt="" width="300px" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Example train vs val loss plot</em></td>
    </tr>
  </tbody>
</table>

<p><a id="using-model"></a></p>
<h2 id="actually-using-our-model">Actually Using our Model</h2>

<p>Now for the fun part: let’s actually use this trained model to generate recommendations for different types of users we might see (if we were Netflix for example).</p>

<p><a id="movie-embeddings"></a></p>
<h3 id="precomputing-movie-embeddings">Precomputing Movie Embeddings</h3>

<p>In oder to get recommendations, we will feed in a new user feature vector through our model, and get a predicted rating for every movie in <em>top_movies</em>. To get a prediction for a movie, we need the item embedding in order to do the dot product with the user embedding. For different users, we don’t need to recompute these embeddings: once we have them for every movie in our catalog, we can re-use them!</p>

<p>We can compute our final embeddings for every movie all at once, then save them to a lookup map, and then easily use them later for any user (no need to ever do a forward pass in the Item Tower).</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># for every movie, save all its embeddings
</span><span class="n">movieId_to_embedding</span> <span class="o">=</span> <span class="p">{}</span>

<span class="k">for</span> <span class="n">movieId</span> <span class="ow">in</span> <span class="n">top_movies</span><span class="p">:</span>
  <span class="n">movieId_to_embedding</span><span class="p">[</span><span class="n">movieId</span><span class="p">]</span> <span class="o">=</span> <span class="p">{}</span>

  <span class="n">item_embedding</span> <span class="o">=</span> <span class="n">ITEM_EMBEDDING_LOOKUP</span><span class="p">[</span><span class="n">torch</span><span class="p">.</span><span class="n">tensor</span><span class="p">([</span><span class="n">item_emb_movieId_to_i</span><span class="p">[</span><span class="n">movieId</span><span class="p">]])]</span>
  <span class="n">movieId_to_embedding</span><span class="p">[</span><span class="n">movieId</span><span class="p">][</span><span class="s">'MOVIEID_EMBEDDING'</span><span class="p">]</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">tanh</span><span class="p">(</span><span class="n">item_embedding</span> <span class="o">@</span> <span class="n">e_W1</span> <span class="o">+</span> <span class="n">e_b1</span><span class="p">)</span>

  <span class="n">movieId_to_embedding</span><span class="p">[</span><span class="n">movieId</span><span class="p">][</span><span class="s">'MOVIE_FEATURE_EMBEDDING'</span><span class="p">]</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">tanh</span><span class="p">(</span><span class="n">torch</span><span class="p">.</span><span class="n">tensor</span><span class="p">([</span><span class="n">movieId_to_context</span><span class="p">[</span><span class="n">movieId</span><span class="p">]])</span> <span class="o">@</span> <span class="n">i_W1</span> <span class="o">+</span> <span class="n">i_b1</span><span class="p">)</span>

  <span class="c1"># compute the combined (concat) item/movie embedding
</span>  <span class="n">item_id_emb</span> <span class="o">=</span> <span class="n">movieId_to_embedding</span><span class="p">[</span><span class="n">movieId</span><span class="p">][</span><span class="s">'MOVIEID_EMBEDDING'</span><span class="p">]</span>
  <span class="n">item_feature_emb</span> <span class="o">=</span> <span class="n">movieId_to_embedding</span><span class="p">[</span><span class="n">movieId</span><span class="p">][</span><span class="s">'MOVIE_FEATURE_EMBEDDING'</span><span class="p">]</span>
  <span class="n">movieId_to_embedding</span><span class="p">[</span><span class="n">movieId</span><span class="p">][</span><span class="s">'MOVIE_EMBEDDING_COMBINED'</span><span class="p">]</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">cat</span><span class="p">((</span><span class="n">item_feature_emb</span><span class="p">.</span><span class="n">view</span><span class="p">(</span><span class="n">item_feature_emb</span><span class="p">.</span><span class="n">size</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span> <span class="o">-</span><span class="mi">1</span><span class="p">),</span>
                                       <span class="n">item_id_emb</span><span class="p">.</span><span class="n">view</span><span class="p">(</span><span class="n">item_id_emb</span><span class="p">.</span><span class="n">size</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span> <span class="o">-</span><span class="mi">1</span><span class="p">)),</span> <span class="n">dim</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</code></pre></div></div>

<p><a id="similar-movies"></a></p>
<h3 id="finding-most-similar-movies">Finding Most Similar Movies</h3>
<p>Since we now have a vector representation of every movie, we can easily find each movie’s most similar movies. This can be useful by itself and companies like Amazon use similar embeddings to power things like ‘Similar to What you just Bought’.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># for every movie, and for every embedding type, find the similary to all other embeddings
# NOTE: can be slow
</span><span class="n">movieId_to_emb_type_to_similarities</span> <span class="o">=</span> <span class="p">{}</span>

<span class="k">for</span> <span class="n">movieId</span> <span class="ow">in</span> <span class="n">top_movies</span><span class="p">:</span>
  <span class="n">movieId_to_emb_type_to_similarities</span><span class="p">[</span><span class="n">movieId</span><span class="p">]</span> <span class="o">=</span> <span class="p">{}</span>

  <span class="k">for</span> <span class="n">emb_type</span> <span class="ow">in</span> <span class="n">movieId_to_embedding</span><span class="p">[</span><span class="n">movieId</span><span class="p">].</span><span class="n">keys</span><span class="p">():</span>
    <span class="n">emb_to_target_to_dist</span> <span class="o">=</span> <span class="p">{}</span>
    <span class="k">for</span> <span class="n">target_id</span> <span class="ow">in</span> <span class="n">top_movies</span><span class="p">:</span>
      <span class="n">src</span> <span class="o">=</span> <span class="n">movieId_to_embedding</span><span class="p">[</span><span class="n">movieId</span><span class="p">][</span><span class="n">emb_type</span><span class="p">].</span><span class="n">view</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>
      <span class="n">target</span> <span class="o">=</span> <span class="n">movieId_to_embedding</span><span class="p">[</span><span class="n">target_id</span><span class="p">][</span><span class="n">emb_type</span><span class="p">].</span><span class="n">view</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>

      <span class="n">distance</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">torch</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">torch</span><span class="p">.</span><span class="nb">pow</span><span class="p">(</span><span class="n">torch</span><span class="p">.</span><span class="n">subtract</span><span class="p">(</span><span class="n">src</span><span class="p">,</span> <span class="n">target</span><span class="p">),</span> <span class="mi">2</span><span class="p">),</span> <span class="n">dim</span><span class="o">=</span><span class="mi">0</span><span class="p">))</span>
      <span class="n">emb_to_target_to_dist</span><span class="p">[</span><span class="n">target_id</span><span class="p">]</span> <span class="o">=</span> <span class="n">distance</span><span class="p">.</span><span class="n">item</span><span class="p">()</span>
    <span class="n">movieId_to_emb_type_to_similarities</span><span class="p">[</span><span class="n">movieId</span><span class="p">][</span><span class="n">emb_type</span><span class="p">]</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">emb_to_target_to_dist</span><span class="p">.</span><span class="n">items</span><span class="p">(),</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">item</span><span class="p">:</span> <span class="n">item</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
</code></pre></div></div>

<h4 id="most-similar-to-lord-of-the-rings-the-return-of-the-king-the-2003">Most Similar to: Lord of the Rings: The Return of the King, The (2003)</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Lord of the Rings: The Fellowship of the Ring, The (2001)
Lord of the Rings: The Two Towers, The (2002)
Hobbit: An Unexpected Journey, The (2012)
Gladiator (2000)
Dune (2021)
</code></pre></div></div>

<h4 id="most-similar-to-star-wars-episode-iv---a-new-hope-1977">Most Similar to: Star Wars: Episode IV - A New Hope (1977)</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Star Wars: Episode V - The Empire Strikes Back (1980)
Star Wars: Episode VI - Return of the Jedi (1983)
Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)
Indiana Jones and the Last Crusade (1989)
Ghostbusters (a.k.a. Ghost Busters) (1984) 
</code></pre></div></div>

<h4 id="most-similar-to-toy-story-1995">Most Similar to: Toy Story (1995)</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Toy Story 2 (1999)
Toy Story 3 (2010)
Monsters, Inc. (2001)
Inside Out (2015)
Bug's Life, A (1998)
</code></pre></div></div>

<h4 id="most-similar-to-saving-private-ryan-1998">Most Similar to: Saving Private Ryan (1998)</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Braveheart (1995)
Black Hawk Down (2001)
Last of the Mohicans, The (1992)
Untouchables, The (1987)
Dirty Dozen, The (1967)
</code></pre></div></div>

<h4 id="most-similar-to-kill-bill-vol-1-2003">Most Similar to: Kill Bill: Vol. 1 (2003)</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Ronin (1998)
French Connection, The (1971)
Run Lola Run (Lola rennt) (1998)
Sin City (2005)
Fistful of Dollars, A (Per un pugno di dollari) (1964)
</code></pre></div></div>

<h4 id="most-similar-to-american-pie-1999">Most Similar to: American Pie (1999)</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>American Pie 2 (2001)
Liar Liar (1997)
Wedding Singer, The (1998)
Meet the Parents (2000)
Wedding Crashers (2005)
</code></pre></div></div>

<h4 id="most-similar-to-princess-mononoke-mononoke-hime-1997">Most Similar to: Princess Mononoke (Mononoke-hime) (1997)</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Spirited Away (Sen to Chihiro no kamikakushi) (2001)
Howl's Moving Castle (Hauru no ugoku shiro) (2004)
Spider-Man: Into the Spider-Verse (2018)
Akira (1988)
Ghost in the Shell (Kôkaku kidôtai) (1995)
</code></pre></div></div>

<p><a id="inference"></a></p>
<h3 id="inference-getting-recommendations">Inference: Getting Recommendations</h3>
<p>Now for the best part: let’s actually get some recommendations for different types of movie lovers!</p>

<p>To get recommendations, we want to build the user’s feature vector based on the genres they like/dislike and the movies they liked/disliked. Then, we pass the user’s context through the user weight matrix <em>u_W1</em> and that gives us the final user embedding. We then just compute the dot product with the combined item embedding of all movies and we will get a predicted rating for every movie!</p>

<p>So, the general flow is like this:</p>
<ol>
  <li>Construct the user feature vector</li>
  <li>Pass it through the User Tower to get user embedding</li>
  <li>Iterate over all movies, and compute the dot product of the user embedding with each movie’s combined embedding</li>
  <li>Sort by predicted score and return the top movies.</li>
</ol>

<p>It will look something like this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">user_context</span> <span class="o">=</span> <span class="n">get_user_context</span><span class="p">()</span> <span class="c1"># placeholder for now
</span><span class="n">X_inference</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">tensor</span><span class="p">([</span><span class="n">user_context</span><span class="p">])</span>
<span class="n">user_embedding_inference</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">tanh</span><span class="p">(</span><span class="n">X_inference</span> <span class="o">@</span> <span class="n">u_W1</span> <span class="o">+</span> <span class="n">u_b1</span><span class="p">)</span>

<span class="n">movieId_to_pred_score</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">movieId</span> <span class="ow">in</span> <span class="n">top_movies</span><span class="p">:</span>
  <span class="c1"># we already have the combined item embedding for every movie to make inference easier.
</span>  <span class="n">item_embedding_combined_inference</span> <span class="o">=</span> <span class="n">movieId_to_embedding</span><span class="p">[</span><span class="n">movieId</span><span class="p">][</span><span class="s">'MOVIE_EMBEDDING_COMBINED'</span><span class="p">]</span>
  <span class="n">movieId_to_pred_score</span><span class="p">[</span><span class="n">movieId</span><span class="p">]</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">einsum</span><span class="p">(</span><span class="s">'ij, ij -&gt; i'</span><span class="p">,</span> <span class="n">user_embedding_inference</span><span class="p">,</span> <span class="n">item_embedding_combined_inference</span><span class="p">).</span><span class="n">item</span><span class="p">()</span>
</code></pre></div></div>

<p>Let’s build some synthetic users and we will use their user contexts to generate new recommendations for them.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">user_type_to_favorite_genres</span> <span class="o">=</span> <span class="p">{</span>
    <span class="s">'Fantasy Lover'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Fantasy'</span><span class="p">],</span>
    <span class="s">'Children</span><span class="se">\'</span><span class="s">s Movie Lover'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Children'</span><span class="p">],</span>
    <span class="s">'Horror Lover'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Horror'</span><span class="p">],</span>
    <span class="s">'Sci-Fi Lover'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Sci-Fi'</span><span class="p">],</span>
    <span class="s">'Comedy Lover'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Comedy'</span><span class="p">],</span>
    <span class="s">'Romance Lover'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Romance'</span><span class="p">],</span>
    <span class="s">'War Movie Lover'</span><span class="p">:</span> <span class="p">[</span><span class="s">'War'</span><span class="p">]</span>
<span class="p">}</span>

<span class="n">user_type_to_worst_genres</span> <span class="o">=</span> <span class="p">{</span>
    <span class="s">'Fantasy Lover'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Horror'</span><span class="p">,</span> <span class="s">'Children'</span><span class="p">],</span>
    <span class="s">'Children</span><span class="se">\'</span><span class="s">s Movie Lover'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Horror'</span><span class="p">,</span> <span class="s">'Romance'</span><span class="p">,</span> <span class="s">'Drama'</span><span class="p">],</span>
    <span class="s">'Horror Lover'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Children'</span><span class="p">],</span>
    <span class="s">'Sci-Fi Lover'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Romance'</span><span class="p">,</span> <span class="s">'Children'</span><span class="p">],</span>
    <span class="s">'Comedy Lover'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Children'</span><span class="p">],</span>
    <span class="s">'Romance Lover'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Children'</span><span class="p">,</span> <span class="s">'Horror'</span><span class="p">],</span>
    <span class="s">'War Movie Lover'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Children'</span><span class="p">]</span>
<span class="p">}</span>

<span class="n">user_type_to_favorite_movies</span> <span class="o">=</span> <span class="p">{</span>
    <span class="s">'Fantasy Lover'</span><span class="p">:</span> <span class="p">[</span>
        <span class="s">'Lord of the Rings: The Fellowship of the Ring, The (2001)'</span><span class="p">,</span>
        <span class="s">'Gladiator (2000)'</span><span class="p">,</span>
        <span class="s">'300 (2007)'</span><span class="p">,</span>
        <span class="s">'Braveheart (1995)'</span>
        <span class="p">],</span>
    <span class="s">'Children</span><span class="se">\'</span><span class="s">s Movie Lover'</span><span class="p">:</span> <span class="p">[</span>
        <span class="s">'Toy Story 2 (1999)'</span><span class="p">,</span>
        <span class="s">'Finding Nemo (2003)'</span><span class="p">,</span>
        <span class="s">'Monsters, Inc. (2001)'</span>
        <span class="p">],</span>
    <span class="s">'Horror Lover'</span><span class="p">:</span> <span class="p">[</span>
        <span class="s">'Blair Witch Project, The (1999)'</span><span class="p">,</span>
        <span class="s">'Silence of the Lambs, The (1991)'</span><span class="p">,</span>
        <span class="s">'Sixth Sense, The (1999)'</span>
        <span class="p">],</span>
    <span class="s">'Sci-Fi Lover'</span><span class="p">:</span> <span class="p">[</span>
        <span class="s">'Star Wars: Episode V - The Empire Strikes Back (1980)'</span><span class="p">,</span>
        <span class="s">'Matrix, The (1999)'</span><span class="p">,</span>
        <span class="s">'Terminator, The (1984)'</span>
        <span class="p">],</span>
    <span class="s">'Comedy Lover'</span><span class="p">:</span> <span class="p">[</span>
        <span class="s">'American Pie (1999)'</span><span class="p">,</span>
        <span class="s">'Dumb &amp; Dumber (Dumb and Dumber) (1994)'</span><span class="p">,</span>
        <span class="s">'Austin Powers: The Spy Who Shagged Me (1999)'</span><span class="p">,</span>
        <span class="s">'Big Lebowski, The (1998)'</span>
      <span class="p">],</span>
    <span class="s">'Romance Lover'</span><span class="p">:</span> <span class="p">[</span>
        <span class="s">'Shakespeare in Love (1998)'</span><span class="p">,</span>
        <span class="s">'There</span><span class="se">\'</span><span class="s">s Something About Mary (1998)'</span><span class="p">,</span>
        <span class="s">'Sense and Sensibility (1995)'</span>
    <span class="p">],</span>
    <span class="s">'War Movie Lover'</span><span class="p">:</span> <span class="p">[</span>
        <span class="s">'Saving Private Ryan (1998)'</span><span class="p">,</span>
        <span class="s">'Apocalypse Now (1979)'</span><span class="p">,</span>
        <span class="s">'Full Metal Jacket (1987)'</span>
    <span class="p">]</span>
<span class="p">}</span>

<span class="n">user_to_inference_context</span> <span class="o">=</span> <span class="p">{}</span>

<span class="k">for</span> <span class="n">user_type</span> <span class="ow">in</span> <span class="n">user_type_to_favorite_genres</span><span class="p">.</span><span class="n">keys</span><span class="p">():</span>
  <span class="n">inference_user_context</span> <span class="o">=</span> <span class="p">[</span><span class="mf">0.0</span><span class="p">]</span> <span class="o">*</span> <span class="n">user_context_size</span>

  <span class="c1"># set genres the user likes.
</span>  <span class="k">for</span> <span class="n">genre</span> <span class="ow">in</span> <span class="n">user_type_to_favorite_genres</span><span class="p">[</span><span class="n">user_type</span><span class="p">]:</span>
    <span class="n">inference_user_context</span><span class="p">[</span><span class="n">user_context_genre_to_i</span><span class="p">[</span><span class="n">genre</span><span class="p">]]</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="mf">2.0</span><span class="p">)</span>

  <span class="c1"># set genres that the user dislikes
</span>  <span class="k">for</span> <span class="n">genre</span> <span class="ow">in</span> <span class="n">user_type_to_worst_genres</span><span class="p">[</span><span class="n">user_type</span><span class="p">]:</span>
    <span class="n">inference_user_context</span><span class="p">[</span><span class="n">user_context_genre_to_i</span><span class="p">[</span><span class="n">genre</span><span class="p">]]</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="o">-</span><span class="mf">2.0</span><span class="p">)</span>

  <span class="c1"># set the user's favorite movies.
</span>  <span class="k">for</span> <span class="n">title</span> <span class="ow">in</span> <span class="n">user_type_to_favorite_movies</span><span class="p">[</span><span class="n">user_type</span><span class="p">]:</span>
    <span class="n">movieId</span> <span class="o">=</span> <span class="n">title_to_movieId</span><span class="p">[</span><span class="n">title</span><span class="p">]</span>
    <span class="n">inference_user_context</span><span class="p">[</span><span class="n">user_context_movieId_to_i</span><span class="p">[</span><span class="n">movieId</span><span class="p">]]</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="mf">2.0</span><span class="p">)</span>

  <span class="n">user_to_inference_context</span><span class="p">[</span><span class="n">user_type</span><span class="p">]</span> <span class="o">=</span> <span class="n">inference_user_context</span>
</code></pre></div></div>

<p>Get their top recommendations:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">user_to_top_recs</span> <span class="o">=</span> <span class="p">{}</span>

<span class="k">for</span> <span class="n">user_type</span> <span class="ow">in</span> <span class="n">user_to_inference_context</span><span class="p">.</span><span class="n">keys</span><span class="p">():</span>

  <span class="n">X_inference</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">tensor</span><span class="p">([</span><span class="n">user_to_inference_context</span><span class="p">[</span><span class="n">user_type</span><span class="p">]])</span>
  <span class="n">user_embedding_inference</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">tanh</span><span class="p">(</span><span class="n">X_inference</span> <span class="o">@</span> <span class="n">u_W1</span> <span class="o">+</span> <span class="n">u_b1</span><span class="p">)</span>

  <span class="n">movieId_to_pred_score</span> <span class="o">=</span> <span class="p">{}</span>
  <span class="k">for</span> <span class="n">movieId</span> <span class="ow">in</span> <span class="n">top_movies</span><span class="p">:</span>
    <span class="c1"># we already have the combined item embedding for every movie to make inference easier.
</span>    <span class="n">item_embedding_combined_inference</span> <span class="o">=</span> <span class="n">movieId_to_embedding</span><span class="p">[</span><span class="n">movieId</span><span class="p">][</span><span class="s">'MOVIE_EMBEDDING_COMBINED'</span><span class="p">]</span>
    <span class="n">movieId_to_pred_score</span><span class="p">[</span><span class="n">movieId</span><span class="p">]</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">einsum</span><span class="p">(</span><span class="s">'ij, ij -&gt; i'</span><span class="p">,</span> <span class="n">user_embedding_inference</span><span class="p">,</span> <span class="n">item_embedding_combined_inference</span><span class="p">).</span><span class="n">item</span><span class="p">()</span>

  <span class="n">top_recs</span> <span class="o">=</span> <span class="p">[]</span>
  <span class="k">for</span> <span class="n">movieId</span><span class="p">,</span> <span class="n">pred_score</span> <span class="ow">in</span> <span class="nb">list</span><span class="p">(</span><span class="nb">sorted</span><span class="p">(</span><span class="n">movieId_to_pred_score</span><span class="p">.</span><span class="n">items</span><span class="p">(),</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">item</span><span class="p">:</span> <span class="n">item</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">reverse</span><span class="o">=</span><span class="bp">True</span><span class="p">)):</span>
    <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">top_recs</span><span class="p">)</span> <span class="o">&gt;=</span> <span class="mi">10</span><span class="p">:</span> <span class="k">break</span>
    <span class="k">if</span> <span class="n">movieId_to_title</span><span class="p">[</span><span class="n">movieId</span><span class="p">]</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">user_type_to_favorite_movies</span><span class="p">[</span><span class="n">user_type</span><span class="p">]:</span>
      <span class="n">top_recs</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">movieId</span><span class="p">)</span>
  <span class="n">user_to_top_recs</span><span class="p">[</span><span class="n">user_type</span><span class="p">]</span> <span class="o">=</span> <span class="n">top_recs</span>
</code></pre></div></div>

<p><a id="examples"></a></p>
<h3 id="example-recommendations">Example Recommendations</h3>

<h4 id="horror-lover">Horror Lover</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Hello, Horror Lover
Because you like: [Horror]
And hate: [Children]

And enjoyed these movies:
Blair Witch Project, The (1999)
Silence of the Lambs, The (1991)
Sixth Sense, The (1999)

You should watch:
Alien (1979)
Videodrome (1983)
Thing, The (1982)
Aliens (1986)
Psycho (1960)
Evil Dead, The (1981)
Shining, The (1980)
Night of the Living Dead (1968)
Invasion of the Body Snatchers (1956)
Get Out (2017)
</code></pre></div></div>

<h4 id="childrens-movie-lover">Children’s Movie Lover</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Hello, Children's Movie Lover
Because you like: [Children]
And hate: [Horror,Romance,Drama]

And enjoyed these movies:
Toy Story 2 (1999)
Finding Nemo (2003)
Monsters, Inc. (2001)

You should watch:
Zootopia (2016)
Kung Fu Panda 2 (2011)
Incredibles, The (2004)
Madagascar: Escape 2 Africa (2008)
Kung Fu Panda (2008)
Bolt (2008)
The Lego Movie (2014)
Megamind (2010)
Rango (2011)
Goonies, The (1985)
</code></pre></div></div>

<h4 id="sci-fi-lover">Sci-Fi Lover</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Hello, Sci-Fi Lover
Because you like: [Sci-Fi]
And hate: [Romance,Children]

And enjoyed these movies:
Star Wars: Episode V - The Empire Strikes Back (1980)
Matrix, The (1999)
Terminator, The (1984)

You should watch:
Spider-Man: Into the Spider-Verse (2018)
Blade Runner (1982)
Aliens (1986)
Star Wars: Episode IV - A New Hope (1977)
Nausicaä of the Valley of the Wind (Kaze no tani no Naushika) (1984)
Akira (1988)
Alien (1979)
Thing, The (1982)
Inception (2010)
Cowboy Bebop: The Movie (Cowboy Bebop: Tengoku no Tobira) (2001)
</code></pre></div></div>

<h4 id="comedy-lover">Comedy Lover</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Hello, Comedy Lover
Because you like: [Comedy]
And hate: [Children]

And enjoyed these movies:
American Pie (1999)
Dumb &amp; Dumber (Dumb and Dumber) (1994)
Austin Powers: The Spy Who Shagged Me (1999)
Big Lebowski, The (1998)

You should watch:
Sting, The (1973)
Thin Man, The (1934)
Kung Fu Hustle (Gong fu) (2004)
Some Like It Hot (1959)
Snatch (2000)
Midnight Run (1988)
What We Do in the Shadows (2014)
Office Space (1999)
21 Jump Street (2012)
Legend of Drunken Master, The (Jui kuen II) (1994)
</code></pre></div></div>

<h4 id="fantasy-lover">Fantasy Lover</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Hello, Fantasy Lover
Because you like: [Fantasy]
And hate: [Horror,Children]

And enjoyed these movies:
Lord of the Rings: The Fellowship of the Ring, The (2001)
Gladiator (2000)
300 (2007)

You should watch:
Princess Bride, The (1987)
Lord of the Rings: The Return of the King, The (2003)
Lord of the Rings: The Two Towers, The (2002)
Spirited Away (Sen to Chihiro no kamikakushi) (2001)
Monty Python and the Holy Grail (1975)
Yojimbo (1961)
Wings of Desire (Himmel über Berlin, Der) (1987)
Seven Samurai (Shichinin no samurai) (1954)
Star Wars: Episode IV - A New Hope (1977)
Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)
</code></pre></div></div>

<h4 id="romance-lover">Romance Lover</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Hello, Romance Lover
Because you like: [Romance]
And hate: [Children,Horror]

And enjoyed these movies:
Shakespeare in Love (1998)
There's Something About Mary (1998)
Sense and Sensibility (1995)

You should watch:
Life Is Beautiful (La Vita è bella) (1997)
Casablanca (1942)
Roman Holiday (1953)
Shawshank Redemption, The (1994)
Singin' in the Rain (1952)
Rebecca (1940)
Good Will Hunting (1997)
Forrest Gump (1994)
Pride &amp; Prejudice (2005)
Modern Times (1936)
</code></pre></div></div>

<h4 id="war-movie-lover">War Movie Lover</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Hello, War Movie Lover
Because you like: [War]
And hate: [Children]

And enjoyed these movies:
Saving Private Ryan (1998)
Apocalypse Now (1979)
Full Metal Jacket (1987)

You should watch:
Schindler's List (1993)
Shawshank Redemption, The (1994)
Boot, Das (Boat, The) (1981)
Godfather, The (1972)
Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1964)
Grave of the Fireflies (Hotaru no haka) (1988)
Great Dictator, The (1940)
Ran (1985)
Pulp Fiction (1994)
Lawrence of Arabia (1962)
</code></pre></div></div>

<p><a id="anti-recs"></a></p>
<h4 id="anti-recommendations">Anti-Recommendations</h4>
<p>What are the WORST movies for certain types of users? Do we get their least favorite genres?</p>

<p>To get anti-recommendations, we just print the bottom 10 (lowest predicted score) recommendations.</p>

<h4 id="childrens-movie-lover---anti-recs">Children’s Movie Lover - Anti Recs</h4>
<p>Hilariously, among the worst movies for someone who likes Children’s Movies, hates Horror and Romance, are Twilight and Nightmare on Elm Street. Those are definitely the worst possible movies for this user.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Hello, Children's Movie Lover
Because you like: [Children]
And hate: [Horror,Romance,Drama]

And enjoyed these movies:
Toy Story 2 (1999)
Finding Nemo (2003)
Monsters, Inc. (2001)

You should NOT watch:
Twilight Saga: New Moon, The (2009)
Legends of the Fall (1994)
Twilight (2008)
Boxing Helena (1993)
Lost Highway (1997)
Wolf (1994)
Bodyguard, The (1992)
Amityville Horror, The (1979)
Wes Craven's New Nightmare (Nightmare on Elm Street Part 7: Freddy's Finale, A) (1994)
Mulholland Drive (2001)
</code></pre></div></div>

<h4 id="horor-movie-lover---worst-recs">Horor Movie Lover - Worst Recs</h4>
<p>For someone who loves Horror and hates Children’s movies, recommending Home Alone 3, Karate Kid, Free Willy, and Happy Feet are delightfully bad recommendations.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Hello, Horror Lover
Because you like: [Horror]
And hate: [Children]

And enjoyed these movies:
Blair Witch Project, The (1999)
Silence of the Lambs, The (1991)
Sixth Sense, The (1999)

You should NOT watch:
Inspector Gadget (1999)
Next Karate Kid, The (1994)
Home Alone 3 (1997)
Free Willy 2: The Adventure Home (1995)
Pocahontas (1995)
Super Mario Bros. (1993)
Happy Feet (2006)
Free Willy (1993)
Karate Kid, Part III, The (1989)
Honey, I Blew Up the Kid (1992)
</code></pre></div></div>

<p><a id="improvements"></a></p>
<h2 id="possible-improvements">Possible Improvements</h2>
<p>Below are some possible ways to improve this current movie recommendation system:</p>
<ul>
  <li>Add better movie features
    <ul>
      <li>Use the movie’s year from the title as a feature (some people might like movies from certain eras)</li>
      <li>Use the movie’s title as a bag of words feature (helps model find similar movies based on title)</li>
      <li>Find and use the movie’s director/publisher (helps the model find similar movies based on who made them)</li>
      <li>Find and use the movie’s actors (some people have favorite actors)</li>
      <li>Leverage MovieLen’s ‘movie tags’ (requires lot of preprocessing but could be useful)</li>
    </ul>
  </li>
  <li>Add better user features
    <ul>
      <li>Find and use the user’s demographic features</li>
      <li>Leverage MovieLen’s user occupation feature (only available for some datasets)</li>
      <li>Use the user’s favorite ‘decade’ as a kind of genre feature (would work well if we added movie’s decade-year)</li>
      <li>Use the user’s favorite directors</li>
      <li>Use the user’s favorire actors</li>
    </ul>
  </li>
  <li>Generate true training examples using the ratings timestamps
    <ul>
      <li>Instead of randomly picking some movies to predict rating for, we can always use the last movies watched by the user as the labels and their earliest movie watches as their watch history</li>
    </ul>
  </li>
  <li>Add an attention/context component to the model [ADVANCED]
    <ul>
      <li>Transformers are all the rage, and because users have an order in how they watched movies, we could use them.</li>
    </ul>
  </li>
  <li>Add movie reviews from IMDB or Rotten Tomatoes as text or sentiment features [ADVANCED]
    <ul>
      <li>Reviews contain rich information about movies and also sentiment about if they are good or bad (in aggregate)</li>
    </ul>
  </li>
  <li>Build a Deeper model
    <ul>
      <li>This model is only one non-linear layer per input. We could go much deeper, ever after combining both user and item embeddings.</li>
    </ul>
  </li>
  <li>Improve the model
    <ul>
      <li>Add BatchNorm, Dropout</li>
    </ul>
  </li>
</ul>

<p><a id="appendix"></a></p>
<h2 id="appendix">Appendix</h2>

<p><a id="2d"></a></p>
<h3 id="visualizing-movies-in-2d">Visualizing Movies in 2D</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span><span class="mi">15</span><span class="p">))</span>
<span class="k">for</span> <span class="n">movieId</span> <span class="ow">in</span> <span class="n">top_movies</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="mi">25</span><span class="p">]:</span>
  <span class="n">i</span> <span class="o">=</span> <span class="n">item_emb_movieId_to_i</span><span class="p">[</span><span class="n">movieId</span><span class="p">]</span>
  <span class="n">plt</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">ITEM_EMBEDDING_LOOKUP</span><span class="p">[</span><span class="n">i</span><span class="p">,</span><span class="mi">0</span><span class="p">].</span><span class="n">data</span><span class="p">,</span> <span class="n">ITEM_EMBEDDING_LOOKUP</span><span class="p">[</span><span class="n">i</span><span class="p">,</span><span class="mi">1</span><span class="p">].</span><span class="n">data</span><span class="p">,</span> <span class="n">s</span><span class="o">=</span><span class="mi">200</span><span class="p">)</span>
  <span class="n">plt</span><span class="p">.</span><span class="n">text</span><span class="p">(</span><span class="n">ITEM_EMBEDDING_LOOKUP</span><span class="p">[</span><span class="n">i</span><span class="p">,</span><span class="mi">0</span><span class="p">].</span><span class="n">item</span><span class="p">(),</span> <span class="n">ITEM_EMBEDDING_LOOKUP</span><span class="p">[</span><span class="n">i</span><span class="p">,</span><span class="mi">1</span><span class="p">].</span><span class="n">item</span><span class="p">(),</span> <span class="n">movieId_to_title</span><span class="p">[</span><span class="n">movieId</span><span class="p">][</span><span class="mi">0</span><span class="p">:</span><span class="mi">20</span><span class="p">],</span> <span class="n">ha</span><span class="o">=</span><span class="s">"center"</span><span class="p">,</span> <span class="n">va</span><span class="o">=</span><span class="s">"center"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'black'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">grid</span><span class="p">(</span><span class="s">'minor'</span><span class="p">)</span>
</code></pre></div></div>

<p><img src="https://nickgreenquist.github.io//blog/assets/MovieLens/2d_plot2.png" alt="Movies in 2D" width="900px" /></p>

<p><a id="training-runs"></a></p>
<h3 id="example-training-runs">Example Training Runs</h3>

<p>Below we will train the model on different datasets and with different model parameters.</p>

<table>
  <thead>
    <tr>
      <th>Dataset</th>
      <th>Model Size</th>
      <th>Observation</th>
      <th>Loss Plot</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Small</td>
      <td>Small</td>
      <td>Overfitting</td>
      <td><img src="https://nickgreenquist.github.io//blog/assets/MovieLens/small-small-loss.png" alt="" width="600px" /></td>
    </tr>
    <tr>
      <td>Small</td>
      <td>Medium</td>
      <td>Extreme Overfitting</td>
      <td><img src="https://nickgreenquist.github.io//blog/assets/MovieLens/small-medium-loss.png" alt="" width="600px" /></td>
    </tr>
    <tr>
      <td>Medium</td>
      <td>Small</td>
      <td>Loss is not great. Could keep learning but probably not worth it</td>
      <td><img src="https://nickgreenquist.github.io//blog/assets/MovieLens/medium-small-loss.png" alt="" width="600px" /></td>
    </tr>
    <tr>
      <td>Medium</td>
      <td>Medium</td>
      <td>Training loss getting better but slight overfitting</td>
      <td><img src="https://nickgreenquist.github.io//blog/assets/MovieLens/medium-medium-loss.png" alt="" width="600px" /></td>
    </tr>
    <tr>
      <td>Medium</td>
      <td>Large</td>
      <td>same as above</td>
      <td><img src="https://nickgreenquist.github.io//blog/assets/MovieLens/medium-large-loss.png" alt="" width="600px" /></td>
    </tr>
    <tr>
      <td>Large</td>
      <td>Small</td>
      <td>No overfitting, but not learning well</td>
      <td><img src="https://nickgreenquist.github.io//blog/assets/MovieLens/large-small-loss.png" alt="" width="600px" /></td>
    </tr>
    <tr>
      <td>Large</td>
      <td>Medium</td>
      <td>No overfitting, but hitting a wall</td>
      <td><img src="https://nickgreenquist.github.io//blog/assets/MovieLens/large-medium-loss.png" alt="" width="600px" /></td>
    </tr>
    <tr>
      <td>Large</td>
      <td>Large</td>
      <td>Looking much better. Loss much lower than pervious runs</td>
      <td><img src="https://nickgreenquist.github.io//blog/assets/MovieLens/large-large-loss.png" alt="" width="600px" /></td>
    </tr>
  </tbody>
</table>

<p><a id="other-domains"></a></p>
<h3 id="applying-recommendations-to-other-domains">Applying Recommendations to Other Domains</h3>

<blockquote>
  <p>NOTE: The below user/item features might be subpar for some domains. These ideas are how I would approach recommending items in each domain as a start. If you want more advanced descriptions, companies usually post techincal blogs about how they recommend content.</p>
</blockquote>

<table>
  <thead>
    <tr>
      <th>Domain</th>
      <th>User Features</th>
      <th>Item Features</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Online Shopping</td>
      <td>Purchases <br /> Returns (negative signal) <br /> Reviews/Ratings <br /> Favorite Categories <br /> Country/City/State <br /> Yearly Purchase Count <br /> Yearly Purchase USD</td>
      <td>Brand <br /> Title <br /> Price <br /> Category <br /> Reviews <br /> Number of Returns</td>
    </tr>
    <tr>
      <td>Books</td>
      <td>Liked/Disliked Books <br /> Liked/Disliked Genres <br />Liked/Disliked Authors</td>
      <td>Title <br />Genres <br />Book text (bag of words) <br />Published Year</td>
    </tr>
    <tr>
      <td>Music Streaming</td>
      <td>Listened to Songs (with counts) <br /> Listened to Artists (with counts) <br /> Favorite Genres <br /> Time of Day of Listens <br /> Day of week of listens <br /> Country/State/City <br /> Language</td>
      <td>Title <br /> Genre <br /> Artist <br /> Listens <br /> Release Date <br /> <em>ADVANCED</em>: Embedding of Audio File</td>
    </tr>
    <tr>
      <td>Social Media</td>
      <td>Liked Posts <br /> Posts with &gt; X seconds hover <br /> Following Account Ids <br /> Followers Account Ids <br /> Country/State/City <br /> Language <br /> Comments</td>
      <td>Account Id  <br /> Bag of Words of Text/Caption <br /> Views per hour <br /> Likes <br /> Comments <br /> Comments Text <br /> <em>ADVANCED</em> Embedding of Image</td>
    </tr>
    <tr>
      <td>Hotel Bookings</td>
      <td>Past Bookings: Location <br /> Past Bookings: Hotel <br /> Favorite Hotel Brands <br /> Favorite Amenities <br /> Location Booking Count <br /> Viewed/Clicked Locations <br /> Viewed/Clicked Hotels <br /> Country <br /> Language <br /> Bookings in past year <br /> USD Spend in past year</td>
      <td>Location <br /> Brand <br /> Price <br /> Stars <br /> Reviews <br /> Amenities</td>
    </tr>
    <tr>
      <td>Ads</td>
      <td>Ad Purchase History <br /> Ad Click History  <br /> Ad Spend USD last year <br /> Favorite Brands <br /> Country/City/State <br /> Age <br /> Gender <br /><br /> Dependent on site serving the ads: <br /> Interests <br /> Page Clicks <br /> Searches</td>
      <td>Brand <br /> Product Categories <br /> Product Price(s) <br /> Clicks <br /> Click thru Rate <br /> Purchases <br /> Impressions <br /> Long Views</td>
    </tr>
  </tbody>
</table>]]></content><author><name></name></author><category term="Projects" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">35 Books that Changed My Life</title><link href="https://nickgreenquist.github.io//blog/miscl/2020/09/04/influential-books.html" rel="alternate" type="text/html" title="35 Books that Changed My Life" /><published>2020-09-04T14:05:14+00:00</published><updated>2020-09-04T14:05:14+00:00</updated><id>https://nickgreenquist.github.io//blog/miscl/2020/09/04/influential-books</id><content type="html" xml:base="https://nickgreenquist.github.io//blog/miscl/2020/09/04/influential-books.html"><![CDATA[<style type="text/css">
    .center-image
    {
        margin: 0 auto;
        display: block;
    }
</style>

<p><img src="https://nickgreenquist.github.io//blog/assets/Books2Rec/books2rec.png" alt="Main" width="800px" class="center-image" /></p>

<p>Reading is one of the most important parts of my life, and has been since I was a little kid. In this post, I want to list out all the books that I feel have changed my life for the better. I won’t summarize or say what the book is about, but only briefly explain how each book changed me.</p>

<p>This also is not a list of my favorite books (altough many here I would count as such). Instead it is the books that I felt changed me the most after finishing them.</p>

<p>You can find all the books I have read <a href="https://www.goodreads.com/review/list/26809953-nick-greenquist?shelf=read">here</a>.</p>

<h2 id="personal-development">Personal Development</h2>
<ol>
  <li>
    <h4 id="how-to-win-friends-and-influence-people"><a href="https://www.goodreads.com/book/show/4865.How_to_Win_Friends_and_Influence_People?from_search=true&amp;from_srp=true&amp;qid=oo3GR6mnMM&amp;rank=1">How to Win Friends and Influence People</a></h4>
    <p>For teaching me how to change from being one of the most antisocial kids in middle and high school to being called a very sociable extrovert.</p>
  </li>
  <li>
    <h4 id="siddhartha"><a href="https://www.goodreads.com/book/show/52036.Siddhartha">Siddhartha</a></h4>
    <p>For teaching me that extremes are never the answer in the search for bettering yourself, and it’s better to just live simply.</p>
  </li>
  <li>
    <h4 id="talent-is-overrated"><a href="https://www.goodreads.com/book/show/4485966-talent-is-overrated">Talent Is Overrated</a></h4>
    <p>For teaching me that it’s not just how many hours you put into something, but that you need deliberate practice.</p>
  </li>
  <li>
    <h4 id="grit-the-power-of-passion-and-perseverance"><a href="https://www.goodreads.com/book/show/25813921-grit">Grit: The Power of Passion and Perseverance</a></h4>
    <p>For teaching me that hard work beats talent that doesn’t work.</p>
  </li>
  <li>
    <h4 id="the-curmudgeons-guide-to-getting-ahead"><a href="https://www.goodreads.com/book/show/18811353-the-curmudgeon-s-guide-to-getting-ahead">The Curmudgeon’s Guide to Getting Ahead</a></h4>
    <p>For teaching me no-nonsense, non-PC practical advice for life.</p>
  </li>
  <li>
    <h4 id="springboard-to-success"><a href="https://www.goodreads.com/book/show/17707493-springboard?from_search=true&amp;from_srp=true&amp;qid=JM1o8hBL1t&amp;rank=1">Springboard to Success</a></h4>
    <p>For teaching me how to organize and reflect about what’s important to me.</p>
  </li>
  <li>
    <h4 id="models-attract-women-through-honesty"><a href="https://www.goodreads.com/book/show/12633800-models">Models: Attract Women through Honesty</a></h4>
    <p>For teaching me that people (not just romantic interests) are attracted to people who are honest with who they are, are open with what they want, and cut the bullshit.</p>
  </li>
  <li>
    <h4 id="the-one-thing"><a href="https://www.goodreads.com/book/show/16256798-the-one-thing?from_search=true&amp;from_srp=true&amp;qid=4SNniApSlG&amp;rank=1">The One Thing</a></h4>
    <p>For teaching me to not multitask with my goals and self improvement, and instead hyper focus on a small amount of goals at a time.</p>
  </li>
</ol>

<h2 id="non-fiction">Non-Fiction</h2>
<ol>
  <li>
    <h4 id="the-rise-of-theodore-roosevelt"><a href="https://www.goodreads.com/book/show/40929.The_Rise_of_Theodore_Roosevelt">The Rise of Theodore Roosevelt</a></h4>
    <p>For teaching what it means to be a ‘jack of all trades’ kind of person and how much is possible to accomplish in a short lifetime.</p>
  </li>
  <li>
    <h4 id="yes-man"><a href="https://www.goodreads.com/book/show/87804.Yes_Man">Yes Man</a></h4>
    <p>For teaching me to say ‘Yes’ to most things. I actually tried this for a year and it changed my life drastically.</p>
  </li>
  <li>
    <h4 id="mans-search-for-meaning"><a href="https://www.goodreads.com/book/show/4069.Man_s_Search_for_Meaning">Man’s Search for Meaning</a></h4>
    <p>For teaching me what is important in life when you have nothing to live for.</p>
  </li>
  <li>
    <h4 id="steve-jobs"><a href="https://www.goodreads.com/book/show/11084145-steve-jobs">Steve Jobs</a></h4>
    <p>For teaching me that world builders in their specialized fields can be assholes in their personal lives, and to keep them separate when judging someone in either.</p>
  </li>
  <li>
    <h4 id="thinking-fast-and-slow"><a href="https://www.goodreads.com/book/show/11468377-thinking-fast-and-slow">Thinking Fast and Slow</a></h4>
    <p>For teaching me to be aware that our brains are not rationale and we let emotions make a lot of ‘gut’ decisions (for better and worse).</p>
  </li>
  <li>
    <h4 id="one-up-on-wall-street"><a href="https://www.goodreads.com/book/show/762462.One_Up_On_Wall_Street">One Up On Wall Street</a></h4>
    <p>For teaching me to try and get opinions on things, not just from the experts in that field, but people affected by it, because the experts are often trapped in a bubble.</p>
  </li>
  <li>
    <h4 id="the-selfish-gene"><a href="https://www.goodreads.com/book/show/61535.The_Selfish_Gene">The Selfish Gene</a></h4>
    <p>For teaching me to not take life so seriously because at the end of the day, we are all just vessels being used by our genes for survival.</p>
  </li>
  <li>
    <h4 id="what-it-means-to-be-a-libertarian"><a href="https://www.goodreads.com/book/show/168899.What_It_Means_to_Be_a_Libertarian">What It Means to Be a Libertarian</a></h4>
    <p>For providing me with a baseline value system on being a lowercase ‘l’ libertarian.</p>
  </li>
  <li>
    <h4 id="hackers"><a href="https://www.goodreads.com/book/show/56829.Hackers">Hackers</a></h4>
    <p>For teaching me about the mindset of the geniuses that created the computer and internet revolution that we all enjoy today.</p>
  </li>
  <li>
    <h4 id="economics-in-one-lesson"><a href="https://www.goodreads.com/book/show/3028.Economics_in_One_Lesson">Economics in One Lesson</a></h4>
    <p>For teaching me the basics of economics (caveat: libertarian spin).</p>
  </li>
  <li>
    <h4 id="nothing-to-envy"><a href="https://www.goodreads.com/book/show/40604846-nothing-to-envy">Nothing to Envy</a></h4>
    <p>For teaching me that no matter how bad things get here, life is still ‘great’ compared to most other places.</p>
  </li>
  <li>
    <h4 id="blood-sweat-and-pixels"><a href="https://www.goodreads.com/book/show/33640770-blood-sweat-and-pixels">Blood, Sweat and Pixels</a></h4>
    <p>For teaching me how brutal the game design industry is and persuading me to switch from Game Design to Computer Science.</p>
  </li>
  <li>
    <h4 id="the-glass-castle"><a href="https://www.goodreads.com/book/show/7445.The_Glass_Castle">The Glass Castle</a></h4>
    <p>For teaching me how terrible parents can be and to make sure I never end up like that.</p>
  </li>
  <li>
    <h4 id="cracking-the-coding-interview"><a href="https://www.goodreads.com/book/show/25707092-cracking-the-coding-interview">Cracking the Coding Interview</a></h4>
    <p>For teaching that getting into top tech companies is doable with hard work and hundreds of hours of practice.</p>
  </li>
  <li>
    <h4 id="there-are-no-children-here"><a href="https://www.goodreads.com/book/show/41918.There_are_No_Children_Here">There are No Children Here</a></h4>
    <p>For teaching me to have more compassion for people dealt the poverty hand at birth and to accept that no matter how much help they might get, they often can never escape.</p>
  </li>
  <li>
    <h4 id="bad-blood"><a href="https://www.goodreads.com/book/show/37976541-bad-blood">Bad Blood</a></h4>
    <p>For teaching me how many bullshitters there are in Silicon Valley, and how they operate.</p>
  </li>
  <li>
    <h4 id="why-information-grows"><a href="https://www.goodreads.com/book/show/20763722-why-information-grows">Why Information Grows</a></h4>
    <p>For teaching me how to see literally everything in a completely new way (as ways of organizing information).</p>
  </li>
  <li>
    <h4 id="the-prize"><a href="https://www.goodreads.com/book/show/169354.The_Prize">The Prize</a></h4>
    <p>For teaching me how to see the world through the lens of a single hyper valuable resource (in this case: oil).</p>
  </li>
</ol>

<h2 id="fiction">Fiction</h2>
<ol>
  <li>
    <h4 id="flowers-for-algernon"><a href="https://www.goodreads.com/book/show/36576608-flowers-for-algernon">Flowers for Algernon</a></h4>
    <p>For teaching me that it’s ok to become stressed and lost when you undergo a lot of change very quickly.</p>
  </li>
  <li>
    <h4 id="the-fountainhead"><a href="https://www.goodreads.com/book/show/2122.The_Fountainhead">The Fountainhead</a></h4>
    <p>For teaching me to find meaning in work, the value of creation, and to not live/create for what society expects of you, but for what you find valuable and worthwhile.</p>
  </li>
  <li>
    <h4 id="east-of-eden"><a href="https://www.goodreads.com/book/show/4406.East_of_Eden">East of Eden</a></h4>
    <p>For teaching me that some people are just plain bad, and that’s ok, but more importantly to look for the good in people instead of always searching for the bad.</p>
  </li>
  <li>
    <h4 id="angle-of-repose"><a href="https://www.goodreads.com/book/show/292408.Angle_of_Repose">Angle of Repose</a></h4>
    <p>For teaching me to not hold grudges and forgive people.</p>
  </li>
  <li>
    <h4 id="shogun"><a href="https://www.goodreads.com/book/show/402093.Sh_gun">Shogun</a></h4>
    <p>For teaching me that there exist cultures that are so different from what I’m used to, but I can still respect them even if I don’t understand them. Also made me fall in love with Japan.</p>
  </li>
  <li>
    <h4 id="the-subtle-knife"><a href="https://www.goodreads.com/book/show/119324.The_Subtle_Knife">The Subtle Knife</a></h4>
    <p>For teaching me to confront my fears about death.</p>
  </li>
  <li>
    <h4 id="a-farewell-to-arms"><a href="https://www.goodreads.com/book/show/10799.A_Farewell_to_Arms">A Farewell to Arms</a></h4>
    <p>For teaching me how painful love can be.</p>
  </li>
  <li>
    <h4 id="the-lord-of-the-rings-1-3"><a href="https://www.goodreads.com/book/show/33.The_Lord_of_the_Rings">The Lord of the Rings 1-3</a></h4>
    <p>For teaching me what a perfect journey and reading experience is.</p>
  </li>
  <li>
    <h4 id="the-count-of-monte-cristo"><a href="https://www.goodreads.com/book/show/7126.The_Count_of_Monte_Cristo">The Count of Monte Cristo</a></h4>
    <p>For teaching me that revenge is often deserved and easy to cheer for.</p>
  </li>
</ol>]]></content><author><name></name></author><category term="miscl" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Implementing Algorand Agreement</title><link href="https://nickgreenquist.github.io//blog/projects/2019/01/04/algorand.html" rel="alternate" type="text/html" title="Implementing Algorand Agreement" /><published>2019-01-04T14:05:14+00:00</published><updated>2019-01-04T14:05:14+00:00</updated><id>https://nickgreenquist.github.io//blog/projects/2019/01/04/algorand</id><content type="html" xml:base="https://nickgreenquist.github.io//blog/projects/2019/01/04/algorand.html"><![CDATA[<style type="text/css">
    .center-image
    {
        margin: 0 auto;
        display: block;
    }
</style>

<script type="text/x-mathjax-config">
      MathJax.Hub.Config({
        tex2jax: {
          skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'],
          inlineMath: [['$','$']]
        }
      });
</script>

<script type="text/javascript" async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-MML-AM_CHTML"></script>

<p>By: Eric deRegt and Nick Greenquist</p>

<h2 id="introduction">Introduction</h2>
<p>Bitcoin suffers from several technical problems. It is wasteful with significant energy usage and high fees. There is a high concentration of power as a few mining pools control the flow of money. The miners have low margins and are in known locations, which makes them susceptible to corruption. The ledger has ambiguity because of the possibility of forks as demonstrated by the emergence of Bitcoin Cash. There are real issues with scalability. It is unclear how the system will scale if millions of users are added to the system. Finally, there is a long latency because you have to wait a number of blocks before you can feel confident that your transaction is permanent.</p>

<p><img src="https://nickgreenquist.github.io//blog/assets/algorand/slow.png" alt="Snail" width="500px" class="center-image" /></p>

<p>In Algorand <sup id="fnref:algorand" role="doc-noteref"><a href="#fn:algorand" class="footnote" rel="footnote">1</a></sup>, there is a single blockchain. There are no forks or proof of work. This is achieved by the Algorand Agreement protocol, which guarantees agreement and consistency. There are several advantages to Algorand’s approach. The set of possible commands is smaller than Bitcoin which speeds up computation. There is true decentralization as no set of users has exogenous power. Payments are final because the probability of a fork is 1/10^-18. Scalability is bounded by the network latency. Finally, security is achieved against adversaries under extreme conditions.</p>

<p>Algorand selects random users known to the entire world. These users are in charge of proposing the next block and propagate the transactions to a small and random committee. The committee members are selected through a lottery where each member votes for himself. This leads to really fast selections that are random.</p>

<h3 id="github-link">GitHub Link</h3>
<p>The code for Algorand can be found here: <a href="https://github.com/...">Algorand</a></p>

<h2 id="algorand-overview">Algorand Overview</h2>
<p>Algorand <sup id="fnref:algorand:1" role="doc-noteref"><a href="#fn:algorand" class="footnote" rel="footnote">1</a></sup> is a cryptocurrency designed to scale to millions of users and confirm transactions in under a minute. Algorand seeks to overcome several challenges: preventing Sybil attacks, scaling to millions of users, and being resilient to denial-of-service attacks and other actions from an adversary that can coordinate with Byzantine nodes. Sybil attacks are combated by giving each user a weight based on their monetary stake in the protocol. If more than ⅔ of the stake is controlled by honest users, Algorand will reach consensus while avoiding forks and double-spending. The protocol produces scalability by the selection of the committee. Rather than every node participating in consensus, a small number of nodes are selected at random. To avoid attacks on committee members are selected privately through sortition and membership changes between rounds and steps.</p>

<p><img src="https://nickgreenquist.github.io//blog/assets/algorand/algorand.png" alt="Logo" width="500px" class="center-image" /></p>

<p>Each user in the system has a public and private key. A transaction is a payment from one user to another and involves a user signing the message with its private key. As in Bitcoin, these transactions are broadcast to peers through a gossip protocol. Each users picks a small random set of peers to propagate messages to. Upon receiving a message, a peer will check the provided signature is valid before gossiping the message to other users. These transactions are put into a log which form the basis for new blocks. Each user takes these transactions and prepares a block in case they are chosen for block proposal.</p>

<p>Algrorand operates in asynchronous rounds and each round produces a new block which is appended to the blockchain. Each round users are selected to propose blocks and for a committee to reach consensus. These selections are made using an algorithm called cryptographic sortition. Soritition chooses a random set of users based on their weighted stake in the system. Verifiable random functions (VRF) are used to achieve this randomness. VRF takes a public seed and the role (proposer, committee member, etc.) for the sortition and returns a hash and proof. Selected users propagate the return values to peers through the gossip protocol. Other users can use public keys to verify that the hash and proof correspond to a given user. An initial seed is provided to all users and a new one is calculated for subsequent rounds by proposers during the agreement protocol. Since users are selected according to their stakes, a user with a high stake may be chosen multiple times in a given round or step.</p>

<p>Sortition is set above a certain threshold to guarantee with high probability that there will be greater than 1 proposer in each round. Since a block can be very large, users gossip two types of messages - one with just their priority and proof and one with the block. Users will select the highest priority proposer as their leader for the current round.</p>

<p>Algorand originally reached consensus using an algorithm called BA<em>. In the first phase of BA</em>, users reduce block agreement to two options. In the second phase, users agree on a propose block or agree on an empty block. Both of the phases have several steps, where committee members vote for a value and the other users count those votes. BA* completes the first phase in two steps. The second phase is completed in two steps if the proposer is honest and 11 steps with a malicious proposer. It is unclear if BA* was successfully implemented by the authors, but after some time another paper was released [2], which introduced a new consensus algorithm called Algorand Agreement. We are not sure what other changes were made to the protocol in the time since the release of the original Algorand paper, but we chose to use Algorand Agreement in our implementation.</p>

<h2 id="algorand-agreement-overview">Algorand Agreement Overview</h2>
<p>Algorand Agreement <sup id="fnref:agreement" role="doc-noteref"><a href="#fn:agreement" class="footnote" rel="footnote">2</a></sup>  is a Byzantine agreement protocol that uses leader election and can operate in a partitioned environment. The protocol uses a hash function and digital signature (SIG), which returns the user id, message, and signed message. The signatures are unique for each public and private key pair.</p>

<p><img src="https://nickgreenquist.github.io//blog/assets/algorand/consensus.png" alt="Consensus" width="500px" class="center-image" /></p>

<p>There are a few assumptions for Algorand Agreement. Adversaries can coordinate optimally, but they cannot break the hash function or forge signatures. The set of all players is N, the cardinality of N is n = 3t + 1, and the number of malicious players is t. All players have access to a public random string R, which has been selected randomly and independently of the players’ public keys.</p>

<h3 id="agreement-protocol">Agreement Protocol</h3>
<p>Two communication settings are considered in the paper. In the first, nodes communicate over a synchronous network. Honest users send messages that are received by all other honest users within a given step. All messages seen by a user i before the start of a step are seen by all honest users at the end of the step because i will propagate all messages she has seen. In the second setting, nodes communicate through a propagation network. Nodes have timers with the same speed. The network can be arbitrarily partitioned and the adversary has full control during this time period. Messages are received by honest users within a known time whenever the network is not partitioned. We focused on the second communication setting for our implementation.</p>

<p>There are three types of messages that are sent in the protocol: next votes, soft votes, and cert votes. Additionally, users will send their credential (SIG(R, p)). If multiple nodes propose a block for a given round, a leader is chosen by iterating over SIG(R, p) and choosing the hash with the smallest value among valid participants.</p>

<p>There are five steps in each period p that run sequentially, described as follows.</p>

<h3 id="communication-setting-2-steps">Communication Setting 2: Steps</h3>

<p><img src="https://nickgreenquist.github.io//blog/assets/algorand/steps.png" alt="Steps" width="500px" class="center-image" /></p>

<p>Step 1 is Value Proposal, which starts at time 0. Committee members propagate their block value and credential if its the first period or if they have received 2t + 1 next-votes for null in period p - 1. If they have received 2t + 1 next-votes for a value that is not null in p - 1, they propagate that value along with their credential for period p.
<script src="https://gist.github.com/nickgreenquist/25d86a0e298273d66c85434a71945df5.js"></script></p>

<p>Step 2 is called the Filtering Step and takes place at time 2ƛ. In this step, a user i identifies his leader from all nodes that have propagated values and that are verified for that round. If the user has received 2t + 1 next-votes for null in p - 1, he will soft-vote his leader’s proposed value. If he has received 2t + 1 next-votes in p -1 for a value that is not null then he will soft-vote that value.
<script src="https://gist.github.com/nickgreenquist/0dbdd4e6a23c2cc766fe7a3abfdf5b31.js"></script></p>

<p>Step 3 is the Certifying Step and runs for clock times 2ƛ-4ƛ. If a user sees 2t + 1 non-null soft-votes for v that user cert-votes v.
<script src="https://gist.github.com/nickgreenquist/f147bc363a2206f7978a1a678768fdc3.js"></script></p>

<p>Step 4 is the First Finishing Step at time 4ƛ. If a user has certified a value for period p, she next-votes v. If she has seen 2t + 1 next-votes for null in period p - 1, she next-votes null. Otherwise, she next-votes her starting value.
<script src="https://gist.github.com/nickgreenquist/c374f7a512bfde8f4296c734f87ac3d5.js"></script></p>

<p>Step 5, which users enter after 4ƛ, is the Second FInishing Step, which she stays in until she can finish the period. If she sees 2t + 1 soft-votes for a non-null value, she will next-vote that value. If she sees 2t + 1 next-votes for null in p - 1 and has not certified in p, then she next-votes null.
<script src="https://gist.github.com/nickgreenquist/66b4bc68923f97c6d42ed3f6e324cd4f.js"></script></p>

<p>These periods continue until the Halting Condition is reached. The Halting Condition is checked any time a cert-vote is received or cast. If a user sees 2t + 1 cert-votes for a value v, they append that value to their blockchain and move to the next round. These cert-votes can be from any period as nodes cannot ever change what value they will cert-vote once casting this type of vote. 
<script src="https://gist.github.com/nickgreenquist/4c79238a483b53751cca64ecd10fd308.js"></script></p>

<h2 id="implementation">Implementation</h2>
<p>Our goal for the project was to create a working implementation of Algorand based on algorithms described in the Algorand<sup id="fnref:algorand:2" role="doc-noteref"><a href="#fn:algorand" class="footnote" rel="footnote">1</a></sup> and Agreement<sup id="fnref:agreement:1" role="doc-noteref"><a href="#fn:agreement" class="footnote" rel="footnote">2</a></sup> papers. We used the details from the Algorand<sup id="fnref:algorand:3" role="doc-noteref"><a href="#fn:algorand" class="footnote" rel="footnote">1</a></sup> paper to construct our overall structure and our algorithms for sortition, gossiping, and block proposal. We used Algorand Agreement<sup id="fnref:agreement:2" role="doc-noteref"><a href="#fn:agreement" class="footnote" rel="footnote">2</a></sup> for consensus and used Communication Setting 2 as described above.</p>

<h3 id="assumptions">Assumptions</h3>
<p>We made a number of assumptions. Honest nodes are required to have at least 2t + 1 stake in the system. Nodes cannot lie about their userId or spoof messages. Nodes cannot change the result of sortition or verifySort. Timers are not synched, but they move at the same speed. Our sortition always selects two users, instead of implementing a probabilistic function with a targeted number of selected users. All users use the same sha256 hash function when using blocks and signatures. Every RPC message makes it to honest users. There are no retries for votes or propagate block messages.</p>

<h3 id="architecture">Architecture</h3>

<p><img src="https://nickgreenquist.github.io//blog/assets/algorand/architect.png" alt="Blueprint" width="500px" class="center-image" /></p>

<p>We organized our project in a similar manner to Lab 2: Raft. Each server runs identical code found in serve.go. Each node also initiates itself using the code in main.go. This code is responsible for gathering all the peers in the system, generating the node’s genesis block, connecting each server to their respective bcStore, and then starting the server.</p>

<p>bcStore stores the blockchain and serves as the connection between the server, client, and the blockchain itself.</p>

<p>In order to simulate a ‘user’ adding transactions to the blockchain, we seperated the ‘add transaction’ request functionality to client code. The client has a one-to-one relationship with a single server. The client can send transaction requests to a server through it’s port, and also get the current blockchain back by another request.</p>

<p>In addition to the main client, server, and bcStore relationship, there are a few other files that help keep the logic organized. All of the code for handling the 5 steps and the halting condition is found in agreement.go. All helper functions such as preparing block objects, hashing values, signing messages, creating SIG structs, generating the committee from sortition, verifying sortition, selecting the minimum leader hash, and initializing initial stake, are all stored in utils.go.</p>

<h3 id="details">Details</h3>
<p>All nodes who join the network create a bcStore which keeps a channel of commands they receive from clients and a blockchain data structure composed of blocks. Nodes connect to other known peers. The Algorand server runs on a goroutine and listens for several types of messages that can trigger different parts of the protocol and sends messages to other nodes.</p>

<p>Nodes keep track of several pieces of internal state. Each node keeps track of its private and public keys, what round, period, and step they are in, and their temporary and proposed blocks. Additionally, there is state for the periods in the agreement protocol. PeriodState includes values that have been proposed and who proposed them and all of the next-votes, cert-votes, and soft-votes used in agreement. We keep one state object for the current period and one for the previous period so that we can monitor the number of votes and appropriately terminate the agreement protocol when it is safe to add a block to our blockchain.
<script src="https://gist.github.com/nickgreenquist/31f1d5b82b4bf8109e65c25d435f3fe9.js"></script></p>

<p>There are four RPC calls in our implementation - AppendTransaction, ProposeBlock, Vote, and RequestBlockChain. In order to pass messages between nodes using the four RPCs, we needed to create channels for both the RPC arguments and responses. 
<script src="https://gist.github.com/nickgreenquist/03c90e445e31923af3c704d3d4c522ba.js"></script></p>

<p>AppendTransactions are handled easily, where nodes simply return true to the broadcaster. Nodes also immediately respond true to the client after receiving a request to add a transaction. We left the client responsible to retry transactions if they later see that their requests did not make it into the updated blockchain. To enable this, clients can send a GET command to their server to receive the server’s current blockchain.</p>

<p>For ProposeBlock RPC, we also set up a channel to listen for any ProposeBlock messages. Whenever a message is received, the receiver first checks if the sender is approved for this round by calling verifySort. If the sender is approved, the receiver adds the proposed value and block to the their internal state that maps proposer credentials to their proposed value. Nodes always return true as long as they receive this message.</p>

<p>Handling incoming Vote messages requires a bit more logic. On top of extracting the correct type, nodes have to update their PeriodState with vote counts for specific values after checking that a sender has not already voted for that period. 
<script src="https://gist.github.com/nickgreenquist/96a9a48b7b7a32b5e71e2bf47111a29b.js"></script></p>

<p>RequestBlockChain requests are handled by simply passing the entire BlockChain into the response channel. The requester will then verify the returned blockchains in the response channel listener in order to decide if they need to replace their blockchain and update their current round and state.</p>

<p>In between rounds, each user keeps track of a TemporaryBlock, which they will use if selected for the committee. When a node receives a transaction from a client, it propagates that transaction to all other nodes through AppendTransaction. The nodes that receive these messages will add these transactions to their respective TemporaryBlocks. When we enter a new round, a proposer will compare the last block in their blockchain with their proposed block and remove any duplicate transactions before proposing a new block.</p>

<p>We created two timers to deal with separate problems: a RoundTimer and an AgreementTimer. The RoundTimer signals when users should enter into the first step of the agreement protocol for the next round. At this point, users will check if they have been selected by sortition to propose a block, and if so, they generate their credentials and broadcast a ProposeBlock message. ProposeBlock takes a block, credential, value, round, and peer as input and is sent to all peers. The value in this message is simply the hash of the block’s object byte memory. 
<script src="https://gist.github.com/nickgreenquist/98010248a7509311dd43f93f2db6086d.js"></script></p>

<p>The AgreementTimer signals when users should proceed between steps. It will first go off at the beginning of step 2. At each step we call a step function, which takes in the current PeriodState, last PeriodState, and required number of votes. The return value is either a value to propagate as Vote message, or a null value which is not acted on. Vote values are propagated using the Vote RPC. Vote takes the type of vote, round, and period as inputs and returns a boolean for success. Our Vote channel updates our vote state for the current and last period, keeping track of who has voted for what at each round and period. In the cert-vote branch, we also check for the halting condition. If we reach the halting condition, agreement has been reached and the block is appended to the node’s blockchain and the state is updated for the next round.</p>

<p>The final thing we considered was how to catch users up to new blocks. When a user receives a block proposal or vote, they can check if the sender is in a later round. If they are, the node may be in a state that is not up to date. This could occur if a node joins the network after it initially starts or if it fails and comes back online. In this case, the node will call the RequestBlockChain RPC to start the process of catching up to the current block so that they can rejoin the consensus process. Upon receiving a RequestBlockChainChan message, a node will respond by sending the current version of its blockchain. The node that is behind checks for the longest blockchain they have seen and verifies that the all blocks are valid by using verifySort on each block’s stamped credential.</p>

<h3 id="tradeoffs">Tradeoffs</h3>

<p><img src="https://nickgreenquist.github.io//blog/assets/algorand/tradeoffs.png" alt="Tradeoffs" width="500px" class="center-image" /></p>

<p>We made a number of design decisions in our implementation that were required due to nonexistent or vague details in the papers. For the design decisions that were not described in great detail in the papers, we tried our best to look at other blockchains and come to a reasonable decision. A couple examples of this were introducing the round timer to kickstart agreement and adding the process for nodes to the catch up to the current round if they see they are behind.</p>

<p>We also had to simplify a couple parts of the system to narrow the scope of our implementation. The first area we simplified was the use of VRFs. Instead of using VRF for sortition, we used the round as a public seed and used a shuffle function we wrote to select users. Each user selects the same random index from a sorted list of userId’s using the same random seed to ensure equal committees across all nodes. We also didn’t use VRF for verifySort. Instead we used the same committee selection algorithm to verify the users had been selected. We assumed that stake does not change, but that it is randomly set at the beginning using peerId as seed. Two users are always selected by sortition for the block proposal stage. Finally, in our proposal we planned on implementing smart contracts on top of Algorand. We didn’t have enough time to implement smart contracts.</p>

<h2 id="challenges">Challenges</h2>
<p>There were several challenges we faced. Most were the result of the paper leaving out many implementation details.</p>

<p><img src="https://nickgreenquist.github.io//blog/assets/algorand/agreement.png" alt="Agreement" width="500px" class="center-image" /></p>

<p>Our biggest challenge was dealing with new users. We define new users as nodes that decided to join the network late or nodes that had recovered after failing. We decided that a new node should request the blockchain if they find out they are behind on the current round. This is easy to check as we send the round with each ProposeBlock and Vote message. We implemented a correct feature where users request, collect, and verify blockchains if they discover they are behind the current round. However, the real difficulty was getting the user to collect all ProposeBlock and Vote messages they missed out on, and also catch up to the correct period and step as the rest of the honest nodes.</p>

<p>Another challenge was in trying to implement VRF. It quickly became clear that it was not feasible to implement this function on our own. There were a number of cryptographic primitives that we were not well versed in and we didn’t feel we had enough time to implement a robust version of this algorithm. We looked at using the LibSodium C library used by the Algorand team, but ran out of time.</p>

<p>The original paper used a different agreement algorithm called BA*. We had spent some time understanding and working through implementation ideas for this protocol before learning that the Algorand team had released a new Algorant Agreement protocol. We decided that it made sense to use the newer protocol. However, it took us a while to understand the new algorithm and how to connect it to the main paper.</p>

<h2 id="testing">Testing</h2>
<p>We tested our implementation in several ways. Initially, we started off by just trying to get a blockchain that sent transactions to peers, gathered blocks, and added blocks to the blockchain without any consensus mechanism. After this, we implemented Algorand Agreement and tested our implementation on four nodes manually using a client package similar to the one from Lab 2. We then ported the launch tool from Lab 2 and used this to test the system on a greater number of nodes with more transactions and allowing for node failures.</p>

<p>Correctness was tested using multiple nodes running at the same time. We sent transactions on different threads to multiple nodes that were running the Algorand server. We checked that the blockchain contained the same blocks after several rounds of the protocol. Blocks were analyzed to make sure hashes and transactions matched across nodes. Additionally, we tested with Byzantine behaviour by making certain peers always behave in a certain way. For example, with 4 nodes, we set peer0 to always propose their own block, only vote for their own values, and ignore all incoming messages from other nodes. The protocol was able to continue safely with the remaining 3 nodes despite peer0 trying to cause mayhem.</p>

<p>Liveness was another goal we tested for. We found that as long as 2t + 1 nodes are up at all times, Agreement rounds terminate eventually.</p>

<p>Performance was found to be very fast. Although we did not have the chance to test our implementation on hundreds of thousands of nodes like the authors of the Algorand paper did, we found that Agreement consistently completed very fast even when testing on Kubernetes using the launch tool with a few dozen nodes.</p>

<h2 id="conclusion">Conclusion</h2>
<p>Algorand promises an extremely fast consensus protocol that would allow for a massively scalable and partition resilient cryptocurrency. Through implementing Algorand Agreement<sup id="fnref:agreement:3" role="doc-noteref"><a href="#fn:agreement" class="footnote" rel="footnote">2</a></sup>, we were pleasantly surprised to find that the basic protocol is correct in a stable and small network of peers. However, the Algorand papers are severely lacking in numerous implementation details that are needed for even a minimum viable product with Algorand. The authors of the Algorand papers promise to make the Algorand code open source. However, only the VRF function has been released as of today. We hope the authors release more of the code in the future and uncover the missing pieces needed to make Algorand practical and useful in real use cases.</p>

<h2 id="references">References</h2>
<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:algorand" role="doc-endnote">
      <p>Yossi Gilad, Rotem Hemo, Silvio Micali, Georgios Vlachos, Nickolai Zeldovich. Algorand: Scaling Byzantine Agreements for Cryptocurrenices, https://people.csail.mit.edu/nickolai/papers/gilad-algorand-eprint.pdf <a href="#fnref:algorand" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:algorand:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:algorand:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:algorand:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a></p>
    </li>
    <li id="fn:agreement" role="doc-endnote">
      <p>Jing Chen, Sergey Gorbunov, Silvio Micali, Georgios Vlachos. Algorand Agreement: Super Fast and Partition Resilient Byzantine Agreement, https://eprint.iacr.org/2018/377.pdf <a href="#fnref:agreement" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:agreement:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:agreement:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:agreement:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="Projects" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">GPU Accelerated Matrix Factorization for Recommender Systems</title><link href="https://nickgreenquist.github.io//blog/projects/2019/01/02/cu2rec.html" rel="alternate" type="text/html" title="GPU Accelerated Matrix Factorization for Recommender Systems" /><published>2019-01-02T14:05:14+00:00</published><updated>2019-01-02T14:05:14+00:00</updated><id>https://nickgreenquist.github.io//blog/projects/2019/01/02/cu2rec</id><content type="html" xml:base="https://nickgreenquist.github.io//blog/projects/2019/01/02/cu2rec.html"><![CDATA[<style type="text/css">
    .center-image
    {
        margin: 0 auto;
        display: block;
    }
</style>

<script type="text/x-mathjax-config">
      MathJax.Hub.Config({
        tex2jax: {
          skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'],
          inlineMath: [['$','$']]
        }
      });
</script>

<script type="text/javascript" async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-MML-AM_CHTML"></script>

<p>By: <a href="https://nickgreenquist.github.io/">Nick Greenquist</a> and <a href="https://dorukkilitcioglu.github.io/">Doruk Kilitcioglu</a></p>

<h2 id="introduction">Introduction</h2>
<p>Matrix Factorization (MF) is a popular algorithm used to power many recommender systems. Efficient and scalable MF algorithms are essential in order to train on the massive datasets that large scale recommender systems utilize. Graphics Processing Unit (GPU) technology has become very popular in recent years and has become widely used in machine learning. The massive parallelism GPUs offer creates an opportunity to develop an accelerated MF algorithm. 
This blog post presents cu2rec, a matrix factorization algorithm written in CUDA. cu2rec implements a parallel version of Stochastic Gradient Descent (SGD) to solve large scale MF problems. cu2rec utilizes multiple advanced techniques to harness better performance from a GPU. These include aggressive use of constant memory for hyper parameters and registers for heavily reused values, a sparse matrix data structure, a reduction sum total loss kernel, parallel lock-free updating of feature weights with minimized global memory writes, and fairness across weight updates using user index striding. With a single NVIDIA GPU, cu2rec can be 10x times faster than state of the art sequential algorithms while reaching similar error metrics.</p>

<p>The code and the instructions on how to run it can be found on the <a href="https://github.com/nickgreenquist/cu2rec">GitHub page</a>.</p>

<h2 id="how-it-works">How It Works</h2>
<p><img src="https://nickgreenquist.github.io//blog/assets/cu2rec/need_gpu.png" alt="GPU" width="500px" class="center-image" /></p>

<p>In this section of this post, we will explain how what GPUs are, how they work, and how they can be used to accelerate a powerful algorithm commonly used for recommender systems: matrix factorization.</p>

<h3 id="how-do-gpus-work">How do GPUs Work</h3>
<p>Today, GPUs are being used more to more to parallelize algorithms that need to run on big data. GPUs have much more compute power than CPUs and work well performing the same instruction on multiple pieces of data (SIMD)<sup id="fnref:book" role="doc-noteref"><a href="#fn:book" class="footnote" rel="footnote">1</a></sup>. GPUs originally were used for heavy graphical work but have become a staple of large machine learning models<sup id="fnref:hpc" role="doc-noteref"><a href="#fn:hpc" class="footnote" rel="footnote">2</a></sup>. Because of the explosion of GPU usage across much of machine learning, we wanted to see if they could be useful in the sub-field of recommender systems.</p>

<p>GPUs are most useful when the task is computationally intensive, there are many independent computations, and the computations are similar. The architectural paradigm that most benefits from this is Single Instruction Multiple Data (SIMD), where a single instruction is executed for multiple data points.</p>

<p><img src="https://nickgreenquist.github.io//blog/assets/cu2rec/simd.png" alt="SIMD" width="300px" class="center-image" /></p>

<p>GPUs implement this SIMD architecture, but they do not devote the whole GPU to that instruction. Rather, only 32 execution units (out of 5000 in recent GPUs) have to execute the same instruction. The best practice is to structure your code to be as much SIMD as possible.</p>

<h4 id="cuda">CUDA</h4>
<p>CUDA is the programming language built by NVIDIA as a general purpose way of interacting with NVIDIA GPUs. It can be used with both C and C++, and gives the user API calls to important GPU functions such as copying data from and to GPU memory, and launching GPU functions called kernels.</p>

<p>Each kernel defines a function that is called per execution unit, called a thread. These threads are bundled into 32 units, called a warp. Each thread in a kernel executes the same <em>function</em>, but only these warps need to be simultaneously executing the same <em>instruction</em>. Furthermore, not all threads need to run concurrently, but they all need to finish before a kernel is done.</p>

<p>For more information about CUDA, you can check Kirk et al<sup id="fnref:book:1" role="doc-noteref"><a href="#fn:book" class="footnote" rel="footnote">1</a></sup>.</p>

<h3 id="matrix-factorization-for-recommender-systems">Matrix Factorization for Recommender Systems</h3>
<p>Matrix Factorization decomposes a rating matrix $R$ of shape $m\times n$ into two feature matrices, $P$ and $Q$. $P$ ($m\times f$) and $Q$ ($n\times f$), where $f$ is the number of factors, are the learned feature matrices. The goal of MF is to train the best $P$ and $Q$ matrices such that $R=P\times Q$, as shown in the figure below. In state of the art models, it also involves learning $U$ ($m\times 1$ matrix of user biases) and $I$ ($n\times 1$ matrix of item biases), with an additional global bias $\mu$ which is set to the mean of all ratings.</p>

<p><a name="fig_mf"></a></p>

<p><img src="https://nickgreenquist.github.io//blog/assets/cu2rec/mf.jpg" alt="MF" width="900px" class="center-image" /></p>

<p>MF is incredibly popular and has been used for serving recommendations by companies such as Amazon and Netflix<sup id="fnref:dummies" role="doc-noteref"><a href="#fn:dummies" class="footnote" rel="footnote">3</a></sup>. In the era of big data, the datasets that MF is being asked to tackle are getting orders of magnitude larger than even the Netflix Dataset<sup id="fnref:netflixwinner" role="doc-noteref"><a href="#fn:netflixwinner" class="footnote" rel="footnote">4</a></sup>. In order to keep up, MF algorithms need to be fast, scalable, and able to handle massive amounts of data.</p>

<p>For more information, please look at these blog posts for <a href="https://dorukkilitcioglu.github.io/2018/09/10/representation-learning-matrix-factorization.html">a technical explanation of Matrix Factorization</a> and <a href="https://dorukkilitcioglu.github.io/2018/05/14/introducing-books2rec.html">an application of Matrix Factorization for book recommendation</a>.</p>

<h3 id="sgd-for-matrix-factorization">SGD for Matrix Factorization</h3>

<p><img src="https://nickgreenquist.github.io//blog/assets/cu2rec/math.png" alt="Math" width="900px" class="center-image" /></p>

<p>For the task of rating prediction, we use the biased SVD popularized by Koren et al.<sup id="fnref:biasedsvd" role="doc-noteref"><a href="#fn:biasedsvd" class="footnote" rel="footnote">5</a></sup>. For each user $u$ and item $i$ pair, the estimated rating $\hat r_{ui}$ and the error for that rating $e_{ui}$ are defined by:</p>

\[\begin{eqnarray}
\hat{r}_{ui} &amp;=&amp; \mu + b_u + b_i + q^T_ip_u\\
e_{ui} &amp;=&amp; r_{ui} - \hat{r}_{ui}
\end{eqnarray}\]

<p>where $b_u$ is the user bias (which is a single float), $b_i$ is the item bias (also a single float), $\mu$ is the global mean, and $p_u$ and $q_i$ are the user and item vectors in $P$ and $Q$ respectively.</p>

<p>Our total loss function is the mean squared error of ratings, plus the regularization terms for the 4 matrices, which are the Frobenius norms of the matrices:</p>

\[\begin{equation}
    L = \Sigma_{i=1}^{n}{(r_i - \hat{r_i})^2} + \lambda_p {\lVert}P{\rVert}_2^2 + \lambda_q {\lVert}Q{\rVert}_2^2 + \lambda_{u} {\lVert}U{\rVert}_2^2 + \lambda_{i} {\lVert}I{\rVert}_2^2
\end{equation}\]

<p>For SGD, we use the loss per item:</p>

\[\begin{equation}
    L_{ui} = e_{ui}^2 + \lambda_pp_u + \lambda_qq_i + \lambda_uU_u + \lambda_iI_i
\end{equation}\]

<p>As with regular gradient descent, the update equation of any parameter $x$ with learning rate $\eta$ is defined by</p>

\[\begin{equation}
    x \leftarrow x - \eta\frac{\partial L}{\partial x}
\end{equation}\]

<p>In our case, the parameters are each and every element of of the $P$, $Q$, $U$, and $I$ matrices. Below are the respective partial derivatives that are used with the update equation above:</p>

\[\begin{eqnarray}
    \frac{\partial L_{ui}}{\partial p_u} &amp;=&amp; -(e_{ui}q_i - \lambda_pp_u)\\
    \frac{\partial L_{ui}}{\partial q_i} &amp;=&amp; -(e_{ui}p_u - \lambda_qq_i)\\
    \frac{\partial L_{ui}}{\partial b_u} &amp;=&amp; -(e_{ui} - \lambda_ub_u)\\
    \frac{\partial L_{ui}}{\partial b_i} &amp;=&amp; -(e_{ui} - \lambda_ii_i)
\end{eqnarray}\]

<h4 id="parallel-sgd-challenges">Parallel SGD Challenges</h4>

<p><img src="https://nickgreenquist.github.io//blog/assets/cu2rec/sgd.png" alt="SGD" width="500px" class="center-image" /></p>

<p>Parallel SGD is a hotly debated topic, but it is definitely being used all around the globe. Tan et al.<sup id="fnref:cumfals" role="doc-noteref"><a href="#fn:cumfals" class="footnote" rel="footnote">6</a></sup> claim that each iteration of SGD is inherently sequential, whereas Recht et al.<sup id="fnref:hogwild" role="doc-noteref"><a href="#fn:hogwild" class="footnote" rel="footnote">7</a></sup> and Yun et al.<sup id="fnref:nomad" role="doc-noteref"><a href="#fn:nomad" class="footnote" rel="footnote">8</a></sup> argue that in the context of sparse matrix factorization in recommender systems, multiple SGD updates can be carried out simultaneously in parallel.</p>

<p>In our implementation, we utilized SGD similar to the ideas outlined in Recht et al<sup id="fnref:hogwild:1" role="doc-noteref"><a href="#fn:hogwild" class="footnote" rel="footnote">7</a></sup>, but modified it to be used with GPUs. Because the matrices we are dealing with are very sparse, we were able to parallelize SGD updates across multiple users and items. We chose to parallelize over users, which in reasonable-sized datasets such as ML-20M, is much higher than the number of CUDA cores in modern GPUs. As a result, we can utilize our GPUs by choosing a random item per user to update on.</p>

<p>We faced a few issues with this approach specific to GPUs, which warranted us to create our own flavor of the parallel SGD algorithm (see algorithm <a href="#1-sgd-kernel">1</a>). The first issue we had was with multiple updates to the same item. We were running atomic operations for the updates to the item matrix, which meant that our code was slow, and worse, there were too many updates on the most popular items. This led to an overly drastic change for the item vectors, making it very hard to balance.</p>

<p>That is when we decided to get rid of the atomicity and have a race condition on the SGD updates, making it so that only one update per item would stick, and the previous update attempts would be void. We called this version of the algorithm Early Bird Loses the Gradient.</p>

<p>After analyzing our kernel, we noticed that the users with higher user ids had lower error than users with lower user ids. This was the result of how blocks are scheduled in CUDA: blocks with lower block ids (which have the lower user ids) get scheduled earlier than higher block ids. As a result, a lot of the item updates that were done by earlier users were being overwritten, and the algorithm was overfitting to the later users.</p>

<p>To combat this issue, we introduced user id striding with each iteration. Each thread handles user $tid + (stride * iters)\ mod\ N$, where $tid$ is the thread id and $N$ is the total number of users. This results in fair item updates with respect to the users.</p>

<p>Another improvement came from not allowing multiple updates to the same item, because that results in more global memory accesses. For that, we added a binary array of all items, and whenever an item is updated, we set its updated value to true. Note that this is done with regular checks, not any costly $atomicCAS$, because it is not a problem if there is a race condition in a warp and it gets overwritten a couple of times. It still is empirically faster, and we call this version of the algorithm Early Bird Gets the Gradient (EBGTG).</p>

<p>Also worth mentioning is that we don’t need to compute the total loss every iteration. It is an expensive operation, and we only do it every couple of iterations to modify the learning rate if the SGD has stopped learning. Each thread in the SGD kernel just calculates the error for the selected user-item pair, resulting in a heavy speedup compared to calculating the error on the whole training set.</p>
<h2 id="implementation-cu2rec">Implementation: cu2rec</h2>
<p>In this section of the post, we will explain how we built cu2rec. We explain how to load in ratings data to train on, the important code that performs the matrix factorization and optimization, and also some advanced programming techniques we used to make it cu2rec fast and accurate.</p>
<h3 id="data-preparation">Data Preparation</h3>
<h4 id="sparse-matrix-representation">Sparse Matrix Representation</h4>
<p>For even decently large datasets, the full ratings matrix $R$ is too big to fit in memory. This is doubly true when we are dealing with GPU memory, as we cannot physically add more memory to a GPU.</p>

<p>In order to represent the matrix, we use the Compressed Sparse Row (CSR) format. CSR matrices are defined by three arrays: $indptr$, $indices$, and $data$. The item indices for user $u$ are stored in $indices[indptr[u]:indptr[u+1]]$ and their corresponding ratings are stored in $data[indptr[u]:indptr[u+1]]$. This allows for efficient indexing into the matrix given a user, which plays into how we construct our kernel.</p>

<p><img src="https://nickgreenquist.github.io//blog/assets/cu2rec/csr.png" alt="CSR" width="500px" class="center-image" /></p>

<p>When representing users with no ratings, we repeat the same value in the $indptr$ array, and handle that as a special case in our kernel.</p>
<h4 id="input-data">Input Data</h4>
<p>Our program accepts CSV files that are formatted as $user\_id,item\_id,ratings$ tuples, wherein both $user\_id$ and $item\_id$ are sequential numerical ids starting from 1, and are sorted based on $user\_id$. For convenience, we provide a script that convert non-sequential ids to sequential ids, and a script that sorts the tuples based on $user\_id$.</p>

<p>The program also needs two different files, one as a training set with which the MF model is trained, and a test set with which the model is evaluated. We use a completely random split across all tuples for generating these files, and split the data into 80% training and 20% test set. We also provide a script to split the data.</p>
<h3 id="overview">Overview</h3>
<p>The cu2rec code is organized into 3 main parts: reading in data into sparse matrix form and moving it to the GPU, using SGD and Loss kernels to train the MF model, and moving data back to the host in order to write the trained components to files.</p>

<p>When a user runs cu2rec, they need to supply the train and test CSV files. cu2rec starts by reading both files into a vector of Rating structs that live in the host memory. It then converts each vector of Ratings into a CSR representation. Next, it moves these matrices into the GPU’s memory. Once the memory allocation is complete, cu2rec then loads in the hyperparameters from a config file into constant memory.</p>

<p>Next, cu2rec calls a $train$ function that is responsible for using the sparse rating matrices to train a MF model. $train$ starts by initializing all of the components randomly using a normal distribution with mean of 0 and standard deviation of 1. For the $P$ and $Q$ matrices, these values are normalized by the number of factors the model will use. Next, all of these components are moved to the GPU’s global memory using standard CUDA API functions. Each is wrapped in a helper function to check for CUDA errors. In addition to moving needed memory to the GPU, cu2rec also initializes a CUDA Random object and other needed variables to handle an adaptive learning rate. Finally, the iterative training of the model can begin.</p>

<p>The main training loop in cu2rec does $totalIterations$ loops over two main steps: update components using SGD and computer total losses. cu2rec parallelizes over the number of users in the matrix for both steps. At the beginning of each iteration, the SGD Kernel is called (algorithm <a href="#1-sgd-kernel">1</a>). The code for the SGD Kernel is below.</p>
<h4 id="1-sgd-kernel">1) SGD Kernel</h4>
<script src="https://gist.github.com/nickgreenquist/1db4ad217aa618307bd78e2b73bd4019.js"></script>

<p>In order for SGD to compute updates to the feature matrices, we needed to create a function that kernels can use to compute a predicted rating using the current components. The code for the Prediction Kernel is below.</p>
<h4 id="2-prediction-kernel">2) Prediction Kernel</h4>
<script src="https://gist.github.com/nickgreenquist/c14833e7b806b09e607c2546c4e3e084.js"></script>

<p>After running SGD for one item for every user, the algorithm checks if it is time to test the updated model on the test ratings (algorithm <a href="#3-loss-kernel">3</a>). This is only done periodically, as computing the total losses is expensive because it needs to compute the error on all ratings, not just one rating per user such as in SGD. To do this, cu2rec first uses the Loss Kernel to compute the loss on every rating.</p>
<h4 id="3-loss-kernel">3) Loss Kernel</h4>
<script src="https://gist.github.com/nickgreenquist/6437e12ac4efdaf02a53a60c6d548eb8.js"></script>

<p>Next, cu2rec uses the Total Loss Kernel to reduce all the individual losses so RMSE and MAE can be computed.</p>
<h4 id="4-total-loss-kernel">4) Total Loss Kernel</h4>
<script src="https://gist.github.com/nickgreenquist/28bafb34f817500bde8e104097922a2b.js"></script>

<p>After computing the new error metrics, cu2rec checks to see if the learning rate should be lowered by checking a patience counter. Finally, cu2rec swaps the pointers of the updated $Q$ and $itemBias$ components because we want to use the new values for the next round of updating. We do not need to keep copies of $P$ and $userBias$ matrices as the updates are simply done on the original matrices. We only keep a target version of $Q$ and $itemBias$ as updates will have race conditions between threads. More is discussed about this in the next section, Early Bird Gets the Gradient.</p>

<p>Once $totalIterations$ of the training loop are complete, it is time to save the trained model and free all necessary memory. First, all the trained components are copied back to the host variables. Next, the CUDA variables are freed along with all host variables that are not part of the trained model. Outside of the $train$ method, the main code is responsible for writing to file all necessary components of the model that can be used to serve recommendations. The necessary components to write to file are: $P$, $Q$, $userBias$, $itemBias$, and $globalBias$. Once all have been written to a file, cu2rec’s final steps are to free those variables memories on the host and terminate the program.</p>
<h3 id="early-bird-gets-the-gradient">Early Bird Gets the Gradient</h3>

<p><img src="https://nickgreenquist.github.io//blog/assets/cu2rec/early_bird.png" alt="EBGTG" width="300px" class="center-image" /></p>

<p>Due to the sequential nature of SGD, we had to come up with a technique to implement SGD with multiple threads potentially trying to update the features of the same item. In SGD, a single error is computed for a rating and both the user corresponding to that rating and the item have their feature matrices and bias weights updated. Because cu2rec parallelizes over users and each user picks a random item they have rated, the same item (especially popular items) is highly likely to be picked by multiple threads. The chance of this becomes near certain with non-trivial amounts of users and is guaranteed if you have more users than items due to the Pigeon Hole principle.</p>

<p>In order to handle race conditions on the same memory, we created the Early Bird Gets the Gradient technique. In the SGD Kernel (algorithm <a href="#1-sgd-kernel">1</a>), each thread picks a random item from the user’s rated items. Then, it computes the error using the current feature matrices and takes the difference with the true rating. Next, it uses this error to update all of the features. Updating the $P$ matrix and $userBias$ values will never have write conflicts since each thread is responsible for a single user. However, updates to values in $Q$ and $itemBias$ matrices requires special care.</p>

<p>In EBGTG, the first thread to pick an item wins the race to write to that item. To implement this, we created a new boolean array, $itemIsUpdated$, that stores a true or false value for if an item’s features have already been updated. When a thread selects a random item, they check the value in the array and set a local $earlyBird$ variable to $true$ if they ‘won the race.’ They then set the value for this item to $true$ so other threads will set their $earlyBird$ variables to $false$. Threads then only waste time doing $f$ global memory writes to $Q$ and one global memory write to $itemBias[y]$.</p>

<p>However, it should be noted that due to warp divergence, even if a single thread in a warp is selected as the early bird, all threads in the warp are blocked until the early bird thread is finished updating $QTrgt$. Warp divergence is a dangerous occurrence in GPUs when branch conditions are introduced into kernels. In a GPU, threads are bundled into a group of threads, usually 32, that is called a warp. Every thread in a warp executes each instruction of a kernel in lockstep. When a conditional is reached, the GPU has no choice but to block all threads that fail that conditional while all the threads that pass it perform all the instructions in that block of code. Then, when the conditional ends, all the threads begin again performing the remaining instructions in the kernel. Therefore, with EBGTG, if a single thread in a warp is the ‘early bird’, all other threads will be blocked until that single thread does the slow global memory writes.</p>

<p>We were worried at first that warp divergence would prevent any speedup from EBGTG versus Early Bird Loses the Gradient, but in empirical testing, we saw a consistent 12-15% speedup for the entire program.</p>

<h3 id="advanced-techniques">Advanced Techniques</h3>
<p>To optimize run-time and ensure best in class results on test sets, cu2rec utilizes multiple advanced techniques. The use of constant memory, registers, and an optimized reduction sum kernel use the GPU’s architecture to greater advantage. Our parallel lock-free updates and user index striding ensure we achieve the best results possible against test set ratings.</p>

<p><img src="https://nickgreenquist.github.io//blog/assets/cu2rec/pika.png" alt="Advanced Techniques" width="500px" class="center-image" /></p>

<h4 id="constant-memory">Constant Memory</h4>
<p>All of our kernels rely on many of the same static variables, such as hyper parameters and dimensions. These values include total iterations, number of factors, $\eta$, $\lambda_p$, $\lambda_q$, $\lambda_u$, and $\lambda_i$.</p>

<p>The constant memory is useful because from the kernel’s point of view, it never changes, and can therefore be aggressively cached. Because all of the values we store in constant memory never change during the execution of cu2rec, all of these values get cached in the beginning of execution and remain in the cache for the runtime of the program.</p>

<h4 id="aggressive-register-use">Aggressive Register Use</h4>
<p>Registers are the fastest memory available on the GPU and accesses to this memory can be over 100x faster than global memory<sup id="fnref:book:2" role="doc-noteref"><a href="#fn:book" class="footnote" rel="footnote">1</a></sup>. As such, one of our goals was to aggressively use registers to store any values that are used more than once in a kernel. We will break down which values we decided to store in registers.</p>

<p>In the SGD Kernel (algorithm <a href="#1-sgd-kernel">1</a>), the following variables are created to store values in registers: $x$ (the user id), $low$ (the first rated item id index pointer), $high$ (the last rated item id index pointer), $yi$ (random item index pointer), $y$ (the item id), $ub$ (user $x$’s bias), $ib$ (item $y$’s bias), $error$ (the error of the model on the rating of user x on item y), $earlyBird$ (if the user is first to update item’s features), $pOld$ (old value for feature matrix $P$ at row $x$ and column $i$), and $qOld$ (old value for feature matrix $Q$ at row $y$ and column $i$).</p>

<p>In the Loss Kernel (algorithm <a href="#3-loss-kernel">3</a>), the following variables are created to store values in registers: $x$ (the user id), $ub$ (user $x$’s bias), and $itemId$ (the item index for every item $x$ has rated).</p>

<p>We experimented with having less register use, thinking it might allow for more blocks to be scheduled at the same time, but that resulted in more global memory accesses (and more pressure on the caches), and was overall empirically slower.</p>

<h4 id="reduction-sum">Reduction Sum</h4>
<p>The total loss kernel, which is used to calculate the global loss after a set number of updates, uses a fixed number of threads $t_g$ per grid regardless of the number of ratings. This makes it easy to scale to very high number of ratings ($N$). It uses the reduction sum technique, wherein each thread at each step calculates a partial sum of its previous partial sum and the next thread’s previous partial sum. This is done for $log(N)$ steps.</p>

<p>Each thread initially calculates the sum of $N/t$ elements sequentially, where each element is $t_g$ apart from the previous one, allowing coalesced memory accesses. These sums are written into the shared memory.</p>

<p>After this step, each block (size $t_b$) reduces its own sum into the sum at $tid=0$. This is done for $log(N)$ steps, wherein in the first step, the first $t_b/2$ threads calculate their partial some with the sums in $tid$ and $tid + t_b/2$, and then $t_b/4$ threads, and so on. This approach coalesces the memory accesses and minimizes branch divergence. We also use loop unrolling to reduce the number of if statements, and also use templating with the block size to get rid of the unnecessary unrolled checks in compile-time.</p>

<p>At the end, each thread with $tid=0$ writes its own sum to the global memory, so we get a partial sum per block. These can be fed into another reduce sum kernel, but we found doing the final addition on the host was faster.</p>

<h4 id="parallel-lock-free-updates">Parallel Lock-Free Updates</h4>
<p>As discussed in the section explaining Early Bird Gets the Gradient, cu2rec does not lock any memory address writes while updating feature matrix weights. Earlier versions of cu2rec utilized atomic operations to update the $Q$ and $itemBias$ matrices. However, after discovering that optimal results could be achieved by simply letting a single thread ‘win’ an update, we decided to remove any atomic operations from any kernel.</p>

<h4 id="user-index-striding">User Index Striding</h4>
<p>EBGTG favors whichever thread selects an item first and sets the $itemIdUpdated$ value to $true$. Therefore, we must ensure that every thread (and therefore every user) gets a fair shot to be the ‘early bird.’ At first, with naive EBGTG, we were seeing 2-3% worse results on test data than other implementations. This gap in results increased for larger datasets, including almost 8% worse results on the Netflix Dataset when compared to best in class results (Xie et al.<sup id="fnref:cumfsgd" role="doc-noteref"><a href="#fn:cumfsgd" class="footnote" rel="footnote">9</a></sup>). We learned that EBGTG was scheduling lower index users first since they would always be in the first few blocks every iteration. While there is no guarantee which blocks begin running first, there are limited SMs on a GPU and therefore some blocks are queued while waiting to run. We found that later users would consistently be scheduled to higher block indexes, and thus run last.</p>

<p>In order to combat this, we decided to offset which user each thread uses to perform SGD. In the training loop, we add an offset variable every iteration and pass that to the SGD kernel (algorithm <a href="#1-sgd-kernel">1</a>). This effect can be seen in the first line where $x$ is computed. What this offset does is ensure that over time, every thread will be responsible for different sections of the user matrix.</p>

<p>By implementing this striding, we immediately began to see equal results in test set error metrics compared to other well known implementations.</p>

<h2 id="results">Results</h2>
<p>We benchmarked our GPU code using an NVIDIA V100 GPU, and the CPU code with an Intel i7-6650U CPU.</p>

<h3 id="speedup">Speedup</h3>

<p><img src="https://nickgreenquist.github.io//blog/assets/cu2rec/ml100_speed.png" alt="ML100k Performance" width="900px" class="center-image" /></p>

<p><img src="https://nickgreenquist.github.io//blog/assets/cu2rec/ml20m_speed.png" alt="ML20M Performance" width="900px" class="center-image" /></p>

<h3 id="rmse">RMSE</h3>

<p><img src="https://nickgreenquist.github.io//blog/assets/cu2rec/ml100_error.png" alt="ML100k RMSE" width="900px" class="center-image" /></p>

<p><img src="https://nickgreenquist.github.io//blog/assets/cu2rec/ml20m_error.png" alt="ML20M RMSE" width="900px" class="center-image" /></p>

<h3 id="gpu-vs-gpu">GPU vs GPU</h3>
<p>We tested our code both with a V100 on NYU’s Prince server, and a TITAN Z in NYU’s cuda2 server. We varied problem sizes, the memory bandwidth requirements, and the number of iterations to get a healthy comparison. This is visualized in the following figure:</p>

<p><img src="https://nickgreenquist.github.io//blog/assets/cu2rec/gpu_vs_gpu.png" alt="GPU vs GPU" width="900px" class="center-image" /></p>

<p>The V100 has 5,120 CUDA cores vs 5,760 of TITAN Z, 900GB/s memory bandwidth vs 672GB/s of TITAN Z, and higher clock speed.</p>

<p>In terms of the cost in performance, going from the ML-100k to ML-20M dataset increases the total problem size, as there are more users, items, and ratings to train for. Holding all other things equal, we would expect a scalable code to have higher speedup with larger problem size, and our speedup satisfies this.</p>

<p>In terms of bandwidth requirement, increasing the number of factors from 50 to 300 has a direct effect on it, because the SGD kernel needs to retrieve more data from the global memory. Holding all other things equal, we would expect a scalable code to have higher speedup with increased memory bandwidth, and our speedup satisfies this.</p>

<h2 id="conclusion">Conclusion</h2>

<p><img src="https://nickgreenquist.github.io//blog/assets/cu2rec/conclusion.jpg" alt="Conclusion" width="500px" class="center-image" /></p>

<p>GPU technology opens the door to massive acceleration for a wide variety of problems. Machine Learning has recently seen a massive boost in effectiveness from both powerful GPUs and the availability of big data to train complex models on. Recommender Systems are an interesting subset of machine learning as they benefit greatly from larger and larger datasets that allow models to uncover complex latent relationships between users and items. SGD is one of the most popular algorithms to optimize recommender system MF models. However, SGD provides a challenging problem for GPU implementations due to its inherent sequential definition.</p>

<p>This blog post introduced cu2rec, a novel parallel implementation of SGD used to solve recommender system matrix factorization. Through the use of a parallel lock free SGD kernel and a variety of advanced CUDA programming techniques, cu2rec achieves a 10x speedup over one of the fastest sequential recommender system MF libraries while matching best in class error metrics. In addition to outperforming sequential implementations of SGD MF, cu2rec has also been shown to scale with better GPU hardware.</p>

<h2 id="references">References</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:book" role="doc-endnote">
      <p>D. B. Kirk and W. H. Wen-Mei, <em>Programming massively parallel processors: a hands-on approach.</em> Morgan kaufmann, 2016. <a href="#fnref:book" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:book:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:book:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:hpc" role="doc-endnote">
      <p>A. Coates, B. Huval, T. Wang, D. Wu, B. Catanzaro, and N. Andrew, “Deep learning with cots hpc systems,” in <em>International Conference on Machine Learning</em>, 2013, pp. 1337–1345. <a href="#fnref:hpc" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:dummies" role="doc-endnote">
      <p>A. Bari, M. Chaouchi, and T. Jung, <em>Predictive analytics for dummies</em>. John Wiley &amp; Sons, 2016. <a href="#fnref:dummies" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:netflixwinner" role="doc-endnote">
      <p>Y. Koren, “The bellkor solution to the netflix grand prize,” <em>Netflix prize documentation</em>, vol. 81, pp. 1–10, 2009 <a href="#fnref:netflixwinner" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:biasedsvd" role="doc-endnote">
      <p>Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” <em>Computer</em>, no. 8, pp. 30–37, 2009. <a href="#fnref:biasedsvd" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:cumfals" role="doc-endnote">
      <p>W. Tan, L. Cao, and L. L. Fong, “Faster and cheaper: Parallelizing large-scale matrix factorization on gpus,” <em>CoRR</em>, vol. abs/1603.03820, 2016. [Online]. Available: http://arxiv.org/abs/1603.03820 <a href="#fnref:cumfals" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:hogwild" role="doc-endnote">
      <p>B. Recht, C. Re, S. Wright, and F. Niu, “Hogwild: A lock-free approach to parallelizing stochastic gradient descent,” in <em>Advances in neural information processing systems</em>, 2011, pp. 693–701. <a href="#fnref:hogwild" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:hogwild:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:nomad" role="doc-endnote">
      <p>H. Yun, H. Yu, C. Hsieh, S. V. N. Vishwanathan, and I. S. Dhillon, “NOMAD: non-locking, stochastic multi-machine algorithm for asynchronous and decentralized matrix completion,” <em>CoRR</em>, vol. abs/1312.0193, 2013. [Online]. Available: http://arxiv.org/abs/1312.0193 <a href="#fnref:nomad" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:cumfsgd" role="doc-endnote">
      <p>X. Xie, W. Tan, L. L. Fong, and Y. Liang, “Cumf sgd: Fast and scalable matrix factorization,” <em>CoRR</em>, vol. abs/1610.05838, 2016. [Online]. Available: http://arxiv.org/abs/1610.05838 <a href="#fnref:cumfsgd" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="Projects" /><summary type="html"><![CDATA[]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://nickgreenquist.github.io//blog/assets/cu2rec/need_gpu.png" /><media:content medium="image" url="https://nickgreenquist.github.io//blog/assets/cu2rec/need_gpu.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Concurrency With Go</title><link href="https://nickgreenquist.github.io//blog/miscl/2018/09/10/concurrency-go.html" rel="alternate" type="text/html" title="Concurrency With Go" /><published>2018-09-10T14:05:14+00:00</published><updated>2018-09-10T14:05:14+00:00</updated><id>https://nickgreenquist.github.io//blog/miscl/2018/09/10/concurrency-go</id><content type="html" xml:base="https://nickgreenquist.github.io//blog/miscl/2018/09/10/concurrency-go.html"><![CDATA[<style type="text/css">
    .center-image
    {
        margin: 0 auto;
        display: block;
    }
</style>

<h2 id="how-to-write-concurrent-function-calls-in-go">How to Write Concurrent Function Calls in Go</h2>
<p>Go is an amazing language. I’d know, I just started using it this weekend. What are some of the cool things you can do with it? Well, how about writing concurrent function calls trivially! Let’s take a look.</p>

<h3 id="simple-sequential-function-calls">Simple Sequential Function Calls</h3>
<p>This code below is pretty simple. It just calls a function 20 times and prints i, from 0 -&gt; 19</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">package</span> <span class="n">main</span>

<span class="k">import</span> <span class="p">(</span>
	<span class="s">"fmt"</span>
<span class="p">)</span>

<span class="k">func</span> <span class="nb">print</span><span class="p">(</span><span class="n">i</span> <span class="kt">int</span><span class="p">)</span> <span class="p">{</span>
	<span class="n">fmt</span><span class="o">.</span><span class="n">Println</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="p">}</span>

<span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
	<span class="k">for</span> <span class="n">i</span> <span class="o">:=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="m">20</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span> <span class="p">{</span>
		<span class="nb">print</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
	<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<h4 id="output">Output:</h4>
<p>0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19</p>

<h3 id="simple-concurrent-function-calls">Simple Concurrent Function Calls</h3>
<p>This code below attempts to fire off 20 concurrent function calls. We use the <code class="language-plaintext highlighter-rouge">go</code> keyword before the function call to tell Go to run this in its own thread. Let’s see what happens if we run this code.</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">package</span> <span class="n">main</span>

<span class="k">import</span> <span class="p">(</span>
	<span class="s">"fmt"</span>
<span class="p">)</span>

<span class="k">func</span> <span class="nb">print</span><span class="p">(</span><span class="n">i</span> <span class="kt">int</span><span class="p">)</span> <span class="p">{</span>
	<span class="n">fmt</span><span class="o">.</span><span class="n">Println</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="p">}</span>

<span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
	<span class="k">for</span> <span class="n">i</span> <span class="o">:=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="m">20</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span> <span class="p">{</span>
		<span class="k">go</span> <span class="nb">print</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
	<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<h4 id="output-1">Output:</h4>

<p>That’s right: nothing was printed! Why did this happen? Well, the <code class="language-plaintext highlighter-rouge">for</code> loop fires off 20 concurrent routines. Those routines are off running on their own. <code class="language-plaintext highlighter-rouge">main()</code> then continues running. What is after the <code class="language-plaintext highlighter-rouge">for</code> loop? Well, nothing. So <code class="language-plaintext highlighter-rouge">main()</code> terminates. Those routines that split off never had a chance to even print anything! So how can we fix this? Let’s take a look at one approach using <code class="language-plaintext highlighter-rouge">channels</code>.</p>

<h3 id="concurrent-function-calls-with-channel">Concurrent Function Calls with Channel</h3>
<p>We can use a <code class="language-plaintext highlighter-rouge">channel</code> in Go which is pretty much a semaphore. A channel is a buffer that can hold <code class="language-plaintext highlighter-rouge">n</code> ‘things’. For this case, we just want to add an int simply to signal something is inside. The type doesn’t matter for what we need.</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">package</span> <span class="n">main</span>

<span class="k">import</span> <span class="p">(</span>
	<span class="s">"fmt"</span>
<span class="p">)</span>

<span class="k">func</span> <span class="nb">print</span><span class="p">(</span><span class="n">i</span> <span class="kt">int</span><span class="p">,</span> <span class="n">c</span> <span class="k">chan</span> <span class="kt">int</span><span class="p">)</span> <span class="p">{</span>
	<span class="n">fmt</span><span class="o">.</span><span class="n">Println</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>

	<span class="c">// this signals to the channel this routine is done</span>
	<span class="n">c</span> <span class="o">&lt;-</span> <span class="m">1</span>
<span class="p">}</span>

<span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
	<span class="n">num_calls</span> <span class="o">:=</span> <span class="m">20</span>
	<span class="n">c</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">(</span><span class="k">chan</span> <span class="kt">int</span><span class="p">,</span> <span class="n">num_calls</span><span class="p">)</span>

	<span class="k">for</span> <span class="n">i</span> <span class="o">:=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">num_calls</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span> <span class="p">{</span>
		<span class="k">go</span> <span class="nb">print</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span>
	<span class="p">}</span>

	<span class="c">// this loop won't terminate until 20 ints have been popped out</span>
	<span class="k">for</span> <span class="n">i</span> <span class="o">:=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">num_calls</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span> <span class="p">{</span>
		<span class="o">&lt;-</span> <span class="n">c</span>
	<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<h4 id="output-2">Output:</h4>
<p>2
0
1
10
5
6
7
8
9
14
11
12
13
3
16
15
19
18
17
4</p>

<p>Now this looks concurrent!!</p>

<p>So, what happened can be boiled down to this. We set up a <code class="language-plaintext highlighter-rouge">channel</code> of size 20. We loop 20 times and call 20 go routines. Next, <code class="language-plaintext highlighter-rouge">main()</code> enters a <code class="language-plaintext highlighter-rouge">for</code> loop. Each call is trying to pop out something from the channel. That <code class="language-plaintext highlighter-rouge">for</code> loop won’t finish until 20 things have some out. If the buffer is empty, Go just waits around until something is added. What is adding things into the buffer? You guessed it! Our routines! So, each time a routine is done, it adds something to the channel, letting <code class="language-plaintext highlighter-rouge">main()</code> inch closer to termination.</p>

<h3 id="concurrent-function-calls-with-waitgroup">Concurrent Function Calls with WaitGroup</h3>
<p>Here is another, more ‘Production Code Approved’ way of waiting for all go routines to finish running. It involves the use of a <code class="language-plaintext highlighter-rouge">WaitGroup</code>.</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">package</span> <span class="n">main</span>

<span class="k">import</span> <span class="p">(</span>
	<span class="s">"fmt"</span>
	<span class="s">"sync"</span>
<span class="p">)</span>

<span class="k">func</span> <span class="nb">print</span><span class="p">(</span><span class="n">i</span> <span class="kt">int</span><span class="p">,</span> <span class="n">wg</span> <span class="o">*</span><span class="n">sync</span><span class="o">.</span><span class="n">WaitGroup</span><span class="p">)</span> <span class="p">{</span>
	<span class="c">// defer means run this line right before exiting the function</span>
	<span class="c">// wg.Done() signals to WaitGroup that this routine is done</span>
	<span class="k">defer</span> <span class="n">wg</span><span class="o">.</span><span class="n">Done</span><span class="p">()</span>
	
	<span class="n">fmt</span><span class="o">.</span><span class="n">Println</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="p">}</span>

<span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
	<span class="n">wg</span> <span class="o">:=</span> <span class="o">&amp;</span><span class="n">sync</span><span class="o">.</span><span class="n">WaitGroup</span><span class="p">{}</span>

	<span class="k">for</span> <span class="n">i</span> <span class="o">:=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="m">20</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span> <span class="p">{</span>
		<span class="n">wg</span><span class="o">.</span><span class="n">Add</span><span class="p">(</span><span class="m">1</span><span class="p">)</span>
		<span class="k">go</span> <span class="nb">print</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">wg</span><span class="p">)</span>
	<span class="p">}</span>

	<span class="n">wg</span><span class="o">.</span><span class="n">Wait</span><span class="p">()</span>
<span class="p">}</span>
</code></pre></div></div>
<h4 id="output-3">Output:</h4>
<p>0
19
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18</p>]]></content><author><name></name></author><category term="miscl" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Building Books2Rec - A Book Recommender System</title><link href="https://nickgreenquist.github.io//blog/projects/2018/06/08/building-books2rec.html" rel="alternate" type="text/html" title="Building Books2Rec - A Book Recommender System" /><published>2018-06-08T12:05:14+00:00</published><updated>2018-06-08T12:05:14+00:00</updated><id>https://nickgreenquist.github.io//blog/projects/2018/06/08/building-books2rec</id><content type="html" xml:base="https://nickgreenquist.github.io//blog/projects/2018/06/08/building-books2rec.html"><![CDATA[<style type="text/css">
    .center-image
    {
        margin: 0 auto;
        display: block;
    }
</style>

<p><img src="https://nickgreenquist.github.io//blog/assets/Books2Rec/books2rec.png" alt="Main" width="800px" class="center-image" /></p>

<p><a href="https://books2rec.me/">Books2Rec</a> is a book recommender system that started as a project for the Big Data Science class at NYU. Using your <a href="https://www.goodreads.com/">Goodreads</a> profile, Books2Rec uses Machine Learning methods to provide you with highly personalized book recommendations. Don’t have a Goodreads profile? We’ve got you covered - just search for your favorite book.</p>

<h3 id="check-it-out-here"><a href="https://books2rec.me/">Check it out here!</a></h3>

<h2 id="table-of-contents">Table of Contents</h2>
<ul>
  <li><a href="#introduction">Introduction</a></li>
  <li><a href="#how-it-works">How it Works</a></li>
  <li><a href="#project-structure">Project Structure</a>
    <ul>
      <li><a href="#data-sources">Data Sources</a></li>
      <li><a href="#rapidminer">RapidMiner</a></li>
      <li><a href="#surprise">Surprise</a></li>
      <li><a href="#recommendation-pipeline">Recommendation Pipeline</a></li>
      <li><a href="#web-app">Web App</a></li>
      <li><a href="#tools-used">Tools Used</a></li>
    </ul>
  </li>
  <li><a href="#creators">Creators</a></li>
  <li><a href="#acknowledgements">Acknowledgements</a></li>
  <li><a href="#references">References</a></li>
</ul>

<h2 id="introduction">Introduction</h2>
<p>Recommender systems is at the forefront of the ways in which content-serving websites like Facebook, Amazon, Spotify, etc. interact with its users. It is said that 35% of Amazon.com’s revenue is generated by its recommendation engine<sup>[1]</sup>. Given this climate, it is paramount that websites aim to serve the best personalized content possible.</p>

<p>As a trio of book lovers, we looked at Goodreads, the world’s largest site for readers and book recommendations. It is owned by Amazon, which itself has a stellar recommendation engine. However, we found that their recommendations leave a lot to be desired.</p>

<p><strong>Here is an example of Goodreads recommending a book about the difficult trek to the Western fronteir of the US based on my high rating of the sequel to Charlie and the Chocolate Factory. I think we can do better.</strong>
<img src="https://nickgreenquist.github.io//blog/assets/Books2Rec/goodreads-charlie.png" alt="Bad Rec" width="300px" class="center-image" />
<em>Example of an unrelated recommendation by Goodreads.</em>
<br /><br /></p>

<p><strong>Below, we are using a hybrid recommender system in order to provide recommendations for Goodreads users (ratings and item features).</strong>
<img src="https://nickgreenquist.github.io//blog/assets/Books2Rec/books2rec-charlie-itemmatrix.png" alt="Good Rec" width="900px" class="center-image" />
<em>Example of our recommendations based on our hybrid model..</em>
<br /><br /></p>

<p><strong>We also provide more ‘traditional’ recommendations that only use the book’s features.</strong>
<img src="https://nickgreenquist.github.io//blog/assets/Books2Rec/books2rec-charlie.png" alt="Similar Rec" width="900px" class="center-image" />
<em>Example of our recommendations based on pure book metadata features. Notice how it picks up on all the other books from the author despite <code>author</code> not being a feature we included in our model.</em>
<br /><br /></p>

<h2 id="how-it-works">How it Works</h2>
<p>We use a hybrid recommender system to power our recommendations. Hybrid systems are the combination of two other types of recommender systems: content-based filtering and collaborative filtering. Content-based filtering is a method of recommending items by the similarity of the said items. That is, if I like the first book of the Lord of the Rings, and if the second book is similar to the first, it can recommend me the second book. Collaborative filtering is a method by which user ratings are used in order to determine user or item similarities. If there is a high correlation of users rating the first Lord of the Rings book and the second Lord of the Rings book, then they are deemed to be similar.</p>

<p>Our hybrid system uses both of these approaches. Our item similarities are a combination of user ratings and features derived from books themselves.</p>

<p>Powering our recommendations is the Netflix-prize winner SVD algorithm<sup>[2]</sup>. It is, without doubt, one of the most monumental algorithms in the history of recommender systems. Over time, we are aiming to improve our recommendations using the latest trends in recommender systems.</p>

<h3 id="svd-for-ratings-matrix">SVD for Ratings Matrix</h3>
<p>What makes the SVD algorithm made famous during the Netflix challenge different than standard SVD is that it does <strong>NOT</strong> assume missing values are 0<sup>[3]</sup>. Standard SVD is a perfect reconstruction of a matrix but has one flaw for our purposes: if a user has not rated a book (which is going to most books), then SVD would model them as having a 0 rating for all missing books.</p>

<p>In order to use SVD for rating predictions, you have to update the values in the matrix to negate this effect. You can use Gradient Descent on the error function of predicted ratings to accomplish this. Once you run Gradient Descent enough times, every value in the decomposed matrix begins to better reflect the correct values for predicting missing ratings, and not for reconstructing the matrix.</p>

<h3 id="evaluation-metrics">Evaluation Metrics</h3>
<p>As with all Machine Learning based projects, you want to make sure what you have used is ‘better’ than other popular methods. As stated before, we used RMSE to evaluate the performance of our trained Latent Factor (SVD) model. Below are the RMSE for several algorithms we calculated while building this project.</p>

<p>There are two widely used metrics in recommender systems that we also use. The <strong>Mean Squared Error</strong>, otherwise known as <em>MAE</em>, is the average difference between a predicted rating an the actual rating. Its close cousin, <strong>Root Mean Squared Error</strong> (otherwise known as <em>RMSE</em>) is still an average distance, but the difference between the predicted rating and the actual rating is squared, meaning that it is much more costly to miss something by a large margin than to miss something by a small margin.</p>

<table>
  <thead>
    <tr>
      <th>Approach</th>
      <th>Params</th>
      <th>Data</th>
      <th>RMSE</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>User k-NN</td>
      <td>k=80</td>
      <td>Goodreads (subset 20%)</td>
      <td>0.864</td>
    </tr>
    <tr>
      <td>User k-NN</td>
      <td>NA</td>
      <td>Full (Goodreads + Amazon)</td>
      <td>0.8875662310051954</td>
    </tr>
    <tr>
      <td>Item k-NN</td>
      <td>NA</td>
      <td>Full (Goodreads + Amazon)</td>
      <td>0.8876182681047732</td>
    </tr>
    <tr>
      <td><strong>SVD</strong></td>
      <td>factors=300, epochs=100</td>
      <td>Full (Goodreads + Amazon)</td>
      <td><strong>0.842684489142339</strong></td>
    </tr>
    <tr>
      <td>SVD</td>
      <td>factors=10, epochs=50</td>
      <td>Full (Goodreads + Amazon)</td>
      <td>0.844118472532902</td>
    </tr>
    <tr>
      <td>SVD</td>
      <td>factors=1000, epochs=20</td>
      <td>Full (Goodreads + Amazon)</td>
      <td>0.8627727919676756</td>
    </tr>
    <tr>
      <td>Autoencoder</td>
      <td>X-300-200-300-X</td>
      <td>Full (Goodreads + Amazon)</td>
      <td>0.893</td>
    </tr>
  </tbody>
</table>

<p><strong>Note</strong>: Not all results from HPC grid search are shown here, only the top model from each batch (small params, large params, medium params).<br />
<strong>Note</strong>: The Autoencoder (inspired by this paper<sup>[4]</sup>) results are highly experimental and need further hyperparameter optimization.</p>

<p>Our final model uses the SVD with 300 factors trained with 100 epochs. Overall, the lower factor models consistently had the best performance versus the very high factor models, however this middle ground (300 factors, 100 epochs) was the absolute best result from our grid search. We also subjectively liked the recommendations it gave for test users more than the very small factor model. This is because with only 10 factors, the model is very generalized. While this might provide small error for rating predictions, the recommendations it gave seemed to make no sense.</p>

<h3 id="why-hybrid">Why Hybrid?</h3>
<p>Why would we not just use one, hyper-optimized Latent Factor (SVD) Model instead of combining it with a Content Based model?</p>

<p>The answer is simply a pure SVD model can lead to very nonsensical, ‘black box’ recommendations that can turn away users. A trained SVD model is simply trying to assign factor strenghts in a matrix for each item in order to minimize some cost function. This cost function is simply trying to minimize the error of predicting hidden ratings in a test set. What this leads to is a very optimized model that, when finally used to make recommendations for new users, can spit out very <strong>subjectively</strong> strange recommendations.</p>

<p>For example, say there is some book A that after being run through a trained SVD model, is most similar in terms of ratings as a book B. The issue is that book B can be completely unrelated to A by ‘traditional’ standards (what the book is about, the genre, etc). What this can lead to is a book like Lord of the Rings Return of the King ending up being most ‘similar’ to a completely unrelated book like Sisterhood of Traveling Pants (yes this happened). This is because it could just be the case that these two books happen to always be rated similarly by users and thus, the SVD model learns to always recommend these books together because it will minimize it’s error function. However, if you ask most fantasy readers, they would probably prefer to be recommended more fantasy books (but not just all other books by Tolkien).</p>

<p>What this leads to is trying to find a balance between exploration (using SVD to recommend books that are similar only in how they are rated by tens of thousands of users) and understandable recommendations (using Content features to recommend other fantasy books if the user has enjoyed the Lord of the Rings books). To solve this issues, we combine the trained SVD matrix with the feature matrix. By doing this, when we map a user to this matrix, the user is mapped to all the hidden concept spaces SVD has learned. Then all the books that model returns are then weighted by how similar they are to the features of the books that the user has highly rated. By doing this, you will get recommendations that are not purely within the same genre that you enjoy, but also not completely oblivious to the types of books you like.</p>

<h2 id="project-structure">Project Structure</h2>

<h3 id="data-sources">Data Sources</h3>
<h4 id="goodbooks-10k">Goodbooks 10k</h4>
<p>6 million ratings from Goodreads here: <a href="https://github.com/zygmuntz/goodbooks-10k">goodbooks-10k repository</a>. Along with ratings, this data also includes great book metadata that was used for the Content Based Model.</p>

<h4 id="amazon-ratings">Amazon Ratings</h4>
<p>The Amazon ratings were kindly provided by <a href="https://snap.stanford.edu/data/web-Amazon.html">Jure Leskovec</a> and <a href="http://jmcauley.ucsd.edu/data/amazon/">Julian McAuley</a><sup>[5]</sup><sup>[6]</sup>. We used the subset of the book ratings that matched the Goodbooks 10k dataset.</p>

<h4 id="data-preprocessing">Data Preprocessing</h4>
<p>Data preprocessing is one of the (if not <em>the</em>) most significant part of any Data Science project. The most difficult part of our data preprocessing was joining the Goodreads data and the Amazon ratings together. The Amazon ratings were attached an Amazon Standard Identification Number (ASIN), but not an ISBN. We mapped the ASIN to book titles, the Goodreads book ids to book titles, and did a hash-join on the two title sets to join both sets of ratings together.</p>

<p>In order to see the difference between the rating distribution between the two datasets, we used visualizations. The visualizations were generated using <code class="language-plaintext highlighter-rouge">R</code> programming language.</p>

<p><img src="https://nickgreenquist.github.io//blog/assets/Books2Rec/unnamed-chunk-7-1.png" alt="Good Rec" width="600px" class="center-image" /></p>

<p>The next step was generating the book features, which was done by constructing tf-idf vectors of the book descriptions, tags, and shelves. There were also a lot of missing images in the Goodreads dataset, which decreased the quality of our web app by a lot, and so these images were re-obtained from Goodreads.</p>

<p>After these steps, the data was clean enough to be server on the web server and converted into a numerical format that was able to be consumed by Machine Learning algorithms.</p>

<h3 id="rapidminer">RapidMiner</h3>
<p><a href="https://rapidminer.com/">RapidMiner</a> is a Data Science platform that allows for rapid prototyping of Machine Learning algorithms. We used RapidMiner to get a ‘feel’ for our data. It was great for quickly applying models and seeing their results, but it proved inflexible, and it could not handle more than 12000 users until there was a memory error or an array overflow. They were able to achieve a RMSE of 0.864 and a MAE of 0.685.</p>

<p><img src="https://nickgreenquist.github.io//blog/assets/Books2Rec/rapidminer.png" alt="RapidMiner Item Recommendation Process" width="600px" class="center-image" /></p>

<h3 id="surprise">Surprise</h3>
<p><a href="http://surpriselib.com/">Surprise</a> is a Python library designed to generate recommendations and evaluate recommenders. It provides a nice API and a nice pipeline for recommender systems, but we found that it was not as malleable as we wanted it to be. It proved to be quite difficult getting different sorts of recommenders to work nicely with it’s pipeline, but standard algorithms like SVD was a breeze.</p>

<p>We used the Surprise library in order to do matrix factorization on the user-item matrix. The SVD algorithm of Surprise uses Gradient Descent to optimize the RMSE, which is one of our end goals. This differs from the regular SVD, where the regular one tries to minimize the matrix reconstruction error. The crucial difference is that <strong>Surprise does not assume that unrated items are rated as 0.</strong></p>

<p>There are multiple hyperparameters one can use for training the SVD model. We used Grid Search on the hyperparameter space in order to find the best hyperparameters, with the help of <a href="https://wikis.nyu.edu/display/NYUHPC/High+Performance+Computing+at+NYU">NYU High Performance Computing</a>.</p>

<h3 id="recommendation-pipeline">Recommendation Pipeline</h3>
<p>In order to have better control over the recommendations, we built our own recommendation pipeline. This pipeline takes as input the preprocessed ratings and book features, uses SVD to learn the item-concept matrix for both ratings and book features, combines the two results, calculates book similarities, and produces recommendations. For testing, this pipeline also includes k-fold cross validation and the calculation of error metrics.</p>

<h3 id="web-app">Web App</h3>
<p>Our web application is powered by <a href="http://flask.pocoo.org/">Flask</a>, the easy to use Python web framework. As mentioned above, our website is <a href="https://books2rec.me/">live</a> for you to test your recommendations with.
<img src="https://nickgreenquist.github.io//blog/assets/Books2Rec/books2rec-start.png" alt="Good Rec" width="600px" class="center-image" /></p>

<h3 id="tools-used">Tools Used</h3>
<ul>
  <li><strong>Surprise</strong>: See <a href="#surprise">Surprise</a></li>
  <li><strong>Rapidminer</strong>: See See <a href="#rapidminer">RapidMiner</a></li>
  <li><strong>RStudio</strong>: We used RStudio for Data Understanding visualizations</li>
  <li><strong>Jupyter Notebook</strong>: For testing all aspects of the Project Lifecycle. Code was moved to a general Util API folder once deemed useful</li>
  <li><strong>Python</strong>: The language of choice for the project and the web app</li>
  <li><strong>Pandas</strong>: Used to store books with all their metadata and also to store the user-item ratings</li>
  <li><strong>Hadoop (on HPC Dumbo)</strong>: Used to get baseline metrics for collaborative filtering. Precomputation of item-item similarity matrix using large item-feature matrix on spark. This is used as input to content-based recommendation model in Mahout.</li>
  <li><strong>HPC (NYU Prince)</strong>: There are multiple hyperparameters one can use for training the SVD model. We used Grid Search on the hyperparameter space in order to find the best hyperparameters, with the help of NYU High Performance Computing. The code for that can be found in the HPC folder.</li>
  <li><strong>Scikit-learn</strong>: We used Scikit-learn to run our vanilla SVD on item features.</li>
  <li><strong>Scipy</strong>: Used for efficiently storing sparse matrices (ratings matrices are extremely sparse)</li>
  <li><strong>Tensorflow</strong>: We used Tensorflow to test our Autoencoder, which was used to generate representations of items similar to how SVD on the item features work. Unfortunately, there are a lot of different hyperparameters to optimize with Deep Neural Networks, and we made better use of our time by focusing on the web app than the Autoencoder.</li>
  <li><strong>Flask</strong>: See <a href="#web-app">Web App</a></li>
  <li><strong>Digital Ocean</strong>: Our web application is hosted on a DO server. We selected 1gb of memory as to be a lightweight deployment</li>
</ul>

<h2 id="creators">Creators</h2>
<ul>
  <li><strong><a href="https://nickgreenquist.github.io/">Nick Greenquist</a></strong></li>
  <li><strong><a href="https://dorukkilitcioglu.github.io/">Doruk Kilitcioglu</a></strong></li>
  <li><strong><a href="https://panghalamit.github.io/">Amit Panghal</a></strong></li>
</ul>

<h2 id="acknowledgements">Acknowledgements</h2>
<ul>
  <li><strong><a href="https://cs.nyu.edu/~abari/">Dr. Anasse Bari</a></strong> - <em>Project Advisor</em></li>
</ul>

<h2 id="references">References</h2>
<ol>
  <li>http://www.mckinsey.com/industries/retail/our-insights/how-retailers-can-keep-up-with-consumers</li>
  <li><a href="https://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf">The BellKor Solution to the Netflix Grand Prize</a></li>
  <li><a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.125.8971&amp;rep=rep1&amp;type=pdf">Generalized Hebbian Algorithm for Incremental Latent Semantic Analysis</a></li>
  <li><a href="https://arxiv.org/pdf/1606.07659.pdf">Hybrid Recommender System based on Autoencoders</a></li>
  <li><a href="https://arxiv.org/abs/1602.01585">R. He, J. McAuley. Modeling the visual evolution of fashion trends with one-class collaborative filtering. WWW, 2016</a></li>
  <li><a href="https://arxiv.org/abs/1506.04757">J. McAuley, C. Targett, J. Shi, A. van den Hengel. Image-based recommendations on styles and substitutes. SIGIR, 2015</a></li>
</ol>]]></content><author><name></name></author><category term="Projects" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">The Effective Engineer - TLDR</title><link href="https://nickgreenquist.github.io//blog/miscl/2018/05/16/effective-engineer-tldr.html" rel="alternate" type="text/html" title="The Effective Engineer - TLDR" /><published>2018-05-16T14:05:14+00:00</published><updated>2018-05-16T14:05:14+00:00</updated><id>https://nickgreenquist.github.io//blog/miscl/2018/05/16/effective-engineer-tldr</id><content type="html" xml:base="https://nickgreenquist.github.io//blog/miscl/2018/05/16/effective-engineer-tldr.html"><![CDATA[<style type="text/css">
    .center-image
    {
        margin: 0 auto;
        display: block;
    }
</style>

<p><em>A TLDR of <a href="http://www.effectiveengineer.com/">The Effective Engineer</a> by Edmond Lau</em></p>

<p>Before starting my internship this summer at <a href="https://www.bluecore.com/">Bluecore</a>, I was given this book to read by the team. Instead of just reading through it and probably forgetting most of it within a month, I wanted to do a type of <em>book report</em>. Writing down the key ideas and condensing the information into my own words helps with retaining information. We learned about this strategy in my overview of <a href="https://nickgreenquist.github.io/blog/miscl/2018/04/10/Learning-How-to-Learn.html">Learning How to Learn</a> on how chunking helps make things stick.</p>

<h2 id="part-1---adopt-the-right-mindsets">Part 1 - Adopt the Right Mindsets</h2>
<p>In the first part of the book, Lau focuses on how you, the individual, can increase your value creation.</p>

<h3 id="focus-on-high-leverage-activities"><strong>Focus on High-Leverage Activities</strong></h3>
<p>Here Lau debunks the myth of the startup or company where engineers work 70-90 hours a week in order to be successful. Instead of pure hours worked, Lau explains how you should instead focus only on <em>high leverage</em> activities</p>

<h1 id="use-leverage-as-your-yardstick-for-effectiveness">Use Leverage as Your Yardstick for Effectiveness</h1>
<p>Leverage = Impact Produced / Time Invested</p>

<p>80-20 Rule: 80% of your impact will come from 20% of your time. Find what that 20% is! Hint: It’s not going through your emails or sitting in meetings.</p>

<p>1 hour each day for 20 days is only 1% of an engineer’s yearly hours worked. Yet, 1 hour each day for the first 20 days of an engineer’s job can snowball tha learning into massive leverage gains later on.</p>

<h1 id="three-ways-to-increase-leverage">Three ways to Increase Leverage</h1>
<ol>
  <li>Reduce the time it takes to complete an activity</li>
  <li>Increase the output of an activity</li>
  <li>Shift focus to a higher leverage activity</li>
</ol>

<p>Here are a few tips to accomplish the above:</p>
<ol>
  <li>Condense meetings into 30 minutes instead of a half hour</li>
  <li>Prepare an agenda for meetings to streamline them</li>
  <li>Replace non-important meetings with emails</li>
  <li>Automate parts of your development or testing workflow</li>
  <li>Focus on launch critical tasks</li>
  <li>Use profiling tools to instantly know where bottlenecks are versus using time finding them</li>
</ol>

<h1 id="focus-on-leverage-points-not-just-easy-wins">Focus on Leverage Points, not Just Easy Wins</h1>
<p>People think the Leverage equation can be gained by just focusing on a ton of ‘low-hanging’ fruit. This is wrong as a bunch of small wins will still be eclipsed by large, critical, high impact tasks.</p>

<p>Tims is like money. You want to invest it in what will give the biggest return. You wouldn’t use your days going around collecting quarters off the street. Instead, invest into things with high payouts.</p>

<h3 id="optimize-for-learning"><strong>Optimize for Learning</strong></h3>
<p>Learning is the single most important investment you can make in yourself (except maybe health and fitness but there are other books on that). Figure out how to optimize your time so you are learning as quickly and early as possible as knowledge is like money: it has compound interest.</p>

<p>A big factor in deciding to work somehwere is if you think you will be challenged enough to always be learning.</p>

<h1 id="adopt-a-growth-mindset">Adopt a Growth Mindset</h1>
<p>Be a <a href="https://www.amazon.com/dp/B003L77XB8/ref=dp-kindle-redirect?_encoding=UTF8&amp;btkr=1">Yes Man</a> to learning opportunities. I actually did this with my life a few years ago (say yes to everything) and it really does change your life.</p>

<p>Force yourself to believe that you can always learn new things and skills and that belief will actually get you to to partake in activities that foster growth.</p>

<h1 id="invest-in-your-rate-of-learning">Invest in Your Rate of Learning</h1>
<p>Learning is like money: it is affected by compound interst.</p>

<p>Invest early and invest in high return knowledge (things that will make more future learning easier). For me, this is math as many of the things I want to do require a solid foundation in Mathematics.</p>

<h1 id="seek-work-conducive-to-learning">Seek Work Conducive to Learning</h1>
<p>We spend most of our waking hours as work. You should find ways to learn while at work.</p>

<p>Below are criteria you should be measuring when deciding where to work:</p>
<ol>
  <li>Fast Growth: is the company growing fast or just kinda floating along?</li>
  <li>Training: the company should offer extensive training opportunities (not just for new hires!)</li>
  <li>Openness: Different teams should not be isolated and closed off</li>
  <li>Pace: you should be able to get things done quickly and not get bogged down by miles of red tape</li>
  <li>People: never be the smartest person in the room</li>
  <li>Autonomy: you should have some freedom and say in what you really want to work on</li>
</ol>

<h1 id="dedicate-work-time-for-learning">Dedicate Work Time for Learning</h1>
<p>Google’s 20% Time: Engineers get one day a week to work on whatever side project they think will help the company in the long run.</p>

<p>Ten suggestions on how to learn on the job:</p>
<ol>
  <li>Study other’s code</li>
  <li>Write more code</li>
  <li>Go through internal technical documentation</li>
  <li>Master your most used programming language</li>
  <li>Ask for code reviews from the harshest critics</li>
  <li>Enroll in classes (even in nearby University)</li>
  <li>Participate in design meetings</li>
  <li>Work on diverse projects (Interleaving from Learning How to Learn anyone??)</li>
  <li>Make sure your team has at least one engineer more senior than you so you can learn</li>
  <li>Jump into new code bases without fear</li>
</ol>

<h1 id="always-be-learning">Always be Learning</h1>
<p>You should also be learning outside of work. Even learning in other disciplines can trickle over to better work performance.</p>

<p>Here are ten ways to learn:</p>
<ol>
  <li>Learn new programming languages and frameworks</li>
  <li>Invest in high demand skills (future proofing)</li>
  <li>Read books (checkout <a href="https://books2rec.me/">Books2Rec</a> for recommendations!)</li>
  <li>Join discussion groups or book clubs</li>
  <li>Attend meetups, talks, and conferences or even just watch them online</li>
  <li>Network and maintain relationships</li>
  <li>Follow bloggers that share great knowledge</li>
  <li>Write to teach. Trying to teach others is by far the best way to learn material yourself. This is my inspiration for this blog!</li>
  <li>Work on side projects</li>
  <li>Pursue what you love. Don’t waste time on TV, web surfing, or social media</li>
</ol>

<h3 id="prioritize-regularly"><strong>Prioritize Regularly</strong></h3>
<p>Success of your company or yourself requires prioritizing the things that actually matter. Prioritizing is a skill like all others. You can get better at figuring out what is important and also estimating how long things will take.</p>

<h1 id="track-to-dos">Track To-Dos</h1>
<p>I use Google Calendar and Google Tasks to track everyting I have to and want to do every day and week. There are thousands of apps that do this or even just keep a journal.</p>

<p>Learning How to Learn also showed us the importance of writing down To-Dos. Our working memory is a very limited cache so writing things down helps us clear out our cache for more in the moment things and doesn’t let us forget future things to do.</p>

<p>You should also learn to assign priority and estimated time to your to-do items. Don’t just brain dump them. Tomorrow you will have to use time to sort through it all!</p>

<h1 id="focus-on-what-directly-produces-value">Focus on what Directly Produces Value</h1>
<p>There are an infinite amount of things you can do at this moment. However, you should do that things that produce the most value.</p>

<p>A good example Lau provided is saving money. Many people say you should skip the Starbucks $4 coffee every morning as this will save X amount of money a year. However, there are other, shorter (not easier) ways that will save MORE money.</p>
<ol>
  <li>Spending an hour or two researching cheaper high expense items like hotel rooms or tickets can save hundreds of dollars.</li>
  <li>Optimizing your stock and savings into higher return accounts will generate more money than saving on coffee every morning.</li>
  <li>Negotiating a higher salary can net you tens of thousands of dollars a year for a day’s worth of extra effort.</li>
</ol>

<p>Of course, saving $4 a day on coffee is still helpful but always look for bigger value items first to tackle!</p>

<h1 id="focus-on-the-important-and-non-urgent">Focus on the Important and Non-Urgent</h1>
<p>This is the famous Qudrant chart. 
<img src="https://nickgreenquist.github.io//blog/assets/EffectiveEngineer/quadrant.png" alt="Priority" width="500px" class="center-image" /></p>

<p>The lesson is that Important and Non-Urgent is JUST AS HIGH PRIORITY as Urgent and Important.</p>

<h1 id="protect-your-makers-schedule">Protect Your Maker’s Schedule</h1>
<p>Engineers need long, uninterupted blocks of time to get things done. Limit your interuptions and make it known you need blocks of ‘do not disturb’ time. This is why working from home can be very helpful.</p>

<h1 id="limit-amount-of-work-in-progress">Limit Amount of Work in Progress</h1>
<p>Our working memory can store 7 +/- 2 chunks. If you have 20 ‘In Progress’ tickets in JIRA, something is wrong. Limit this to 2 or 3.</p>

<h1 id="fight-procrastination">Fight Procrastination</h1>
<p>My <a href="https://nickgreenquist.github.io/blog/miscl/2018/04/10/Learning-How-to-Learn.html">Learning How to Learn</a> post has valuable information on this topic, much of which is repeated in this chapter of the book.</p>

<p>One new piece of advice is to form a plan of when you will do things in the future. The simple fact of planning our future blocks of time to work on something will actually make you 2-3x more likely to do it.</p>

<h1 id="make-a-routine">Make a Routine</h1>
<p>Use the Pomodoro Technique to split up your day into 25 minute productive chunks. Aim to maximize how many Pomodoro blocks you complete each day for high leverage work (25 minutes of answering emails does not count).</p>

<p>Split up your To-Do list into a ‘Doing’, ‘Today’, and ‘This Week’ sections and shuffle things from one to the next each day. Only assign as many things as can realistically get done per day.</p>

<h2 id="part-2---execute-execute-execute">Part 2 - Execute, Execute, Execute</h2>
<p>Part 2 explains how an engineer can actually get stuff done.</p>

<h3 id="invest-in-iteration-speed"><strong>Invest in Iteration Speed</strong></h3>
<p>Companies are starting to realize that pushing dozens of code changes every day is better than a few large code changes every week or month. Setting up a workflow that allows for this constant iteration is crucial to surviving in today’s fast paced tech scene.</p>

<h1 id="move-fast-to-learn-fast">Move Fast to Learn Fast</h1>
<p>‘Move Fast and Break Things’ - Facebook mantra. It’s better to move fast and learn from failures quickly than taking it too slow and not taking enough risks. It’s easier to roll back changes and new features than to push dozens of them instantly.</p>

<h1 id="invest-in-time-saving-tools">Invest in Time Saving Tools</h1>
<p>One way to save time is to create tools that automate of speed up repetetive processes. Even though this slows you down at first, over the long run, it will save more time.</p>

<p>Another technique to save time is to prototype in high level languages like Python. Only use low level languages once you have the prototypes done and approved.</p>

<h1 id="shorten-your-debugging-and-validation-loops">Shorten Your Debugging and Validation Loops</h1>
<p>Create code that drops you right into the buggy pieces of code so you can work on debugging them rather than wasting time repeatedly getting back to the bugged code. This example is highlighted with working with CSS styling for specific web pages. Code up a shortcut to take you right to the page you are styling rather than clicking from the login page everytime.</p>

<h1 id="master-your-programming-environment">Master Your Programming Environment</h1>
<ol>
  <li>Get proficient with your favorite text editor or IDE</li>
  <li>Learn at least one high-level language that can be used to prototype ideas (Python)</li>
  <li>Get familiar with shell commands (Unix shell). This is VERY important.</li>
  <li>Prefer keyboard shortcuts over the mouse</li>
  <li>Automate your manual workflows. If you’ve done something the same way twice, automate it for the third time.</li>
  <li>Test ideas in an interpretable language and not in one that requires code compilation (C, C++, Java)</li>
  <li>Make running unit tests insanely easy and fast by making Make files.</li>
</ol>

<h1 id="dont-ignore-non-engineering-bottlenecks">Don’t Ignore Non-Engineering Bottlenecks</h1>
<p>Here are a few bottlenecks that can occur and slow down engineering:</p>
<ol>
  <li>Waiting on other people to get things you need (ie Photoshop images from Design Team). Solve this with constant communication with your PM and updating them with what you need.</li>
  <li>Obtaining approval from higher-ups. This is hard to solve and should be avoided in the engineering culture.</li>
  <li>Review processes (QA). Do not wait to QA all your work at the last minute. QA should happen in real time alongside development.</li>
</ol>

<h3 id="measure-what-you-want-to-improve"><strong>Measure What You Want to Improve</strong></h3>
<p>You need to convert certain goals into numeric values in order to measure them. It’s hard finding what to measure. For example, Google figured out that clicks was not a good thing to measure but the time a user spent in a result page offered by search before returning to research was a great metric.</p>

<h1 id="use-metrics-to-drive-progress">Use Metrics to Drive Progress</h1>
<p>Metrics provide the following pros:</p>
<ol>
  <li>They help you stay focused on the right things</li>
  <li>When visualized over time, they help prevent future regressions by pinpointing what changes caused them.</li>
  <li>Good metrics drive progress forward at all times (assuming they the metrics are increasing).</li>
  <li>Metrics let you measure effectiveness of what you are spending time on and allow you to compare the value of doing other things.</li>
</ol>

<h1 id="pick-the-right-metric">Pick the Right Metric</h1>
<p>Metrics need to satisfy the folliwing 3 properties:</p>
<ol>
  <li>Metrics need to maximize impact</li>
  <li>Metrics need to be actionable. They should provide info that you can make changes on to respond to. They should not be vague.</li>
  <li>Metrics need to be responsive. You need to pick a metric that will respond to a change made today so you can measure it.</li>
</ol>

<h1 id="instrument-everything-to-understand-whats-going-on">Instrument Everything to Understand What’s Going On</h1>
<p>You should be measuring EVERYTHING in your company. Not doing this will result in scrambling to figure out why some crucial service died. Large companies can afford to build custom in house software to do this, but there are many third-party measuring software that any tech company can purchase.</p>

<h1 id="internalize-useful-numbers">Internalize Useful Numbers</h1>
<p>Know your numbers! This applies to finance as it does to the technology companies. Knowing your numbers is like knowing your health vitals. 
Here are a few numbers to always know:</p>
<ol>
  <li>Number of registered users and number of active users</li>
  <li>Amount and total data capacity</li>
  <li>Amount of data read/written every day</li>
  <li>Number of servers a single service takes up</li>
  <li>Throughput of services or endpoints</li>
  <li>Growth rate of traffic</li>
  <li>Page load time (average, per browser, etc)</li>
  <li>Traffic distribution accross different pages or services</li>
  <li>Distribution accross devices and OS’s</li>
</ol>

<h1 id="be-skeptical-of-data-integrity">Be Skeptical of Data Integrity</h1>
<p>Statistics can lie and liars use statistics. Never blindly trust data. 
Here are a few ways to increase the trustworthiness of your data:</p>
<ol>
  <li>Log as much data as you can</li>
  <li>Build tools that assist data accuracy earlier rather than later</li>
  <li>Write integration tests to ensure data quality has not worsened</li>
  <li>Examine data soon after it’s collected rather than next week when its needed</li>
  <li>Use cross validation (compute the same metric using different pieces of the data)</li>
  <li>If your gut tells you a number looks off, it usually is. Investigate!</li>
</ol>

<h3 id="validate-your-ideas-early-and-often"><strong>Validate Your Ideas Early and Often</strong></h3>
<p>Optimize for feeback earlier. Don’t spend a year working on a product only to have the end users hate it.</p>

<h1 id="find-low-effort-ways-to-validate-your-work">Find Low Effort Ways to Validate Your Work</h1>
<p>Create a MVP (Minimum Viable Product) and use this to gather feedback and make changes. Spend the first 10% of your time creating it. This will save massive amounts of time compared to changing features of the final product that took 50% of your time to create.</p>

<h1 id="validate-product-changes-with-ab-testing">Validate Product Changes with A/B Testing</h1>
<p>A/B testing is when you show a percentage of users one version of the product and the other percentage a control version. You only make changes if the changed version responded with better metrics. A/B testing is a critical part of all of the largest tech companies and it not going anywhere soon. It essentially allows you to use statistics on subjective feedback, something that is traditionally not possible.</p>

<h1 id="beware-the-one-person-team">Beware the One Person Team</h1>
<p>Try to avoid having a single person be responsible for an entire feature or product. However, this is sometimes not possible to avoid.
Here are some tips to help this not blow up your company if that person gets run over by a bus:</p>
<ol>
  <li>Be open to feedback</li>
  <li>Commit code early and often</li>
  <li>Request code reviews from the toughest reviewers</li>
  <li>Bounce ideas off diverse range of teammates</li>
  <li>Have people review design docs before wasting time coding things up</li>
  <li>Structure projects so different teammates have responsibility for different pieces</li>
  <li>Get end-user buy in before implementing features</li>
</ol>

<h1 id="build-feedback-loops-for-your-decisions">Build Feedback Loops for Your Decisions</h1>
<p>Rather than making an important decision and moving on, set up a feedback loop that enables you to collect data and measure how valuable your work has been up to this point.</p>

<h3 id="improve-your-project-estimation-skills"><strong>Improve Your Project Estimation Skills</strong></h3>
<p>44% of projects are delivered late, overbudget, or without key requirements. 
24% never complete at all
79% is the average time a overrun project exceeds its initial estimation</p>

<h1 id="use-accurate-estimations-to-drive-project-planning">Use Accurate Estimations to Drive Project Planning</h1>
<ol>
  <li>Decompose the project into granular tasks (not one mega ticket)</li>
  <li>Estimate on how long a task will take, not how long you desire it to be done in</li>
  <li>Estimates are probability distributions. The actual time to complete will be under or over the mean time you have picked (assuming you perfectly picked the mean to begin with!)</li>
  <li>Always have the person responsible for the task actually make the estimate</li>
  <li>Beware the mythical man-month. Adding an engineer to a one person team does not double the speed of the project. Adding bodies does not follow a linear curve.</li>
  <li>Use historical data to fine tune your estimates for future projects. If you always underestimate, add time to this projects estimates.</li>
  <li>Don’t let features keep growing in time</li>
  <li>Allow others to challenge estimates and don’t make them afraid to speak up</li>
</ol>

<h1 id="budget-for-the-unknown">Budget for the Unknown</h1>
<p>There will always be things you can’t predict that will eat up at your time. Always budget extra time for these unknowns and budget more ‘unknown’ time for longer projects.</p>

<h1 id="define-specific-goals-and-milestones">Define Specific Goals and Milestones</h1>
<p>As with using metrics to measure the value of a product, you can use goals and milestones to measure the growth and value of a development project. Goals and milestones also help with not feeling burnt out.</p>

<h1 id="reduce-risk-early">Reduce Risk Early</h1>
<p>Always tackle the hardest and riskiest features first. You don’t want to complete 9/10 easy features and then realize the 10th is not doable and have to scrap the project.</p>

<h1 id="beware-project-rewrites">Beware Project Rewrites</h1>
<p>Engineers are always thinking a rewrite will be easy, quick, and much much better than what already exists. 99% of the time they are wrong.</p>

<h1 id="dont-sprint-in-the-middle-of-a-marathon">Don’t Sprint in the Middle of a Marathon</h1>
<p>Only do a spring or crunch if the end is actually near. If you sprint in the middle of a project, you are dooming your team and will not be able to finish. 
Here are a few facts to keep in mind:</p>
<ol>
  <li>Productiviy of a single hour decreases as you work more hours</li>
  <li>You are more behing schedule than you think you are. This is especially true in the early stages</li>
  <li>More hours can leas to burn out</li>
  <li>Working more hours can lead to team members resenting each other</li>
  <li>Communication will ramp up as deadlines approach and this can lead to bottlenecks and overhead</li>
  <li>Sprinting and crunch times increase technical debt</li>
</ol>

<h2 id="part-3---build-long-term-value">Part 3 - Build Long-Term Value</h2>
<p>Companies are trying to maximize profit and revenue in the long run, not just for this year. There are many tips and tricks to maximize value in the long-term.</p>

<h3 id="build-quality-with-pragmatism"><strong>Build Quality with Pragmatism</strong></h3>
<p>Software quality is always a trade off between actually getting things done and having solid, bug free code. Large companies like Google can get away with insanely difficult code standards to achieve while this level of confinement would cripple any start up that needs to get a product up and running.</p>

<h1 id="establish-sustainable-code-review-process">Establish Sustainable Code Review Process</h1>
<p>Code reviews should be present in any tech company. Here are a few reasons why:</p>
<ol>
  <li>Code reviews catch bugs and design flaws early</li>
  <li>Knowing you have to be reviewed makes you less likely to commit quick and dirty code</li>
  <li>Allows others to see what the company’s code standards are</li>
  <li>Distributes the knowledge of the code base to others</li>
  <li>Increases long term agility because the code base won’t be riddled with bugs</li>
</ol>

<h1 id="manage-complexity-through-abstraction">Manage Complexity Through Abstraction</h1>
<p>Eventually, you will want to do things that are so advanced, you can’t expect single engineers to recreate the entire code needed. Just as we don’t all code in Assembly, we want to abstract as many things as possible. A great example is the Google File System (that turned into Hadoop). This abstraction of running algorithms on distributed data allowed engineers to focus on writing algorithms rather than managing the insanely complex intracacies of distributed data.</p>

<h1 id="automate-testing">Automate Testing</h1>
<p>Automating testing allows you to keep developing without the fear that your changes are breaking existing features. Spend time setting up large scale and comprehensive testing scripts that cover most of your code and features.</p>

<h1 id="repay-technical-debt">Repay Technical Debt</h1>
<p>Eventually techincal debt needs to be dealt with. However, you can pick and choose which debts to pay off. LinkedIn for example spent months after going public freezing all code changes and only fixing technical debt in the code base.</p>

<h3 id="minimize-operational-burden"><strong>Minimize Operational Burden</strong></h3>
<p>Avoid tech or tools that will create headaches later on. Use stable and trusted software to build your products with. Keep it simple stupid!</p>

<h1 id="embrace-operational-simplicity">Embrace Operational Simplicity</h1>
<p>Always keep things simple. Here are a few things that happen if you don’t:</p>
<ol>
  <li>Engineering expertise gets splintered accross multiple systems</li>
  <li>More complex pieces of the puzzle introduce more points of failure</li>
  <li>New engineers will take longer to get up to speed</li>
  <li>Complexity takes time away from testing, abstraction, etc</li>
</ol>

<h1 id="build-systems-to-fail-fast">Build Systems to Fail Fast</h1>
<p>Failing fast is actually preferable to staying alive after a problem has occured. This doesn’t seem to make sense but this actually lets you solve the real issues as they occur rather than after they have corrupter other parts of the product. Always deal with critical errors as they occur and don’t let them slip by undercover for weeks or months.</p>

<h1 id="relentlessly-automate-mechanical-tasks">Relentlessly Automate Mechanical Tasks</h1>
<p>Here are a few things that should be automated (versus doing manually):</p>
<ol>
  <li>Validating code or running tests</li>
  <li>ETL of data</li>
  <li>Detecting error rate spikes</li>
  <li>Deploying software to new machines</li>
  <li>Restoring database snapshots</li>
  <li>Running batch computations</li>
  <li>Restarting a web service</li>
  <li>Checking code styles (use a linter)</li>
  <li>Training machine learning models</li>
  <li>Managing user accounts and data</li>
  <li>Removing or adding servers to services</li>
</ol>

<h1 id="make-batch-process-idempotent">Make Batch Process Idempotent</h1>
<p>Idempotent: running the same script over and over, no matter how many times, gives back the same results. 
This is the same idea as non-mutable code. You shouldn’t have your state affected by running something. This makes it harder to track down errors.</p>

<h1 id="be-able-to-respond-and-recover-quickly">Be Able to Respond and Recover Quickly</h1>
<p>Create a Chaos Monkey: a piece of software the creates havoc in your systems, brining services down and other pieces of infrastructure. This allows you to know how to support your products even with individual pieces going down. All of the major tech companies now use Chaos Monkeys to test their products. It’s better to fail in a test environment than in the real world.</p>

<h3 id="invest-in-your-teams-growth"><strong>Invest in Your Team’s Growth</strong></h3>
<p>Your individual success comes from how successful your company is and your company’s success stems from how successful all of the individual workers are. This creates a nice Game Theory environment where it’s in everyone’s interest to help each other and the company.</p>

<h1 id="make-hiriing-everyones-responsibility">Make Hiriing Everyone’s Responsibility</h1>
<p>Hiring might be one of the most critical parts of your company’s long term health. Hiring is responsible for who is working on and developing all the things that need to be done. Also, hiriing is responsible for who your teammates will be so hiring should be everyone’s priority.</p>

<h1 id="design-a-good-onboarding-process">Design a Good Onboarding Process</h1>
<p>Onboarding is critical to getting new hires up to speed and productive. Every hour they aren’t onboarded is a loss to the company’s bottom line. 
Here are 4 goals all onboarding processes should accomplish:</p>
<ol>
  <li>Ramp up new engineers as quickly as possible</li>
  <li>Impart the team’s culture and values so everyone is on the same page</li>
  <li>Get new engineers to master the fundementals critical to this company’s products and tools</li>
  <li>Get new engineers socially integrated with exisiting team members</li>
</ol>

<p>Mentorship is also a widely used and effective onboarding technique.</p>

<h1 id="share-ownership-of-code">Share Ownership of Code</h1>
<p>Sharing ownership of code helps team moral and also prevents projects from being derailed if an employee leaves or is hit by a bus. 
Here are some ways to increase code ownership:</p>
<ol>
  <li>Avoid one person teams</li>
  <li>Do code reviews</li>
  <li>Rotate tasks around the team</li>
  <li>Keep code quality high. It’s easy to own code if you can know what it does quickly</li>
  <li>Do tech talks and show and tells</li>
  <li>DOCUMENT CODE!!</li>
  <li>Document workflows and non-obvious things to do to get things working</li>
  <li>Have mentors</li>
</ol>

<h1 id="build-collective-wisdom-through-post-mortems">Build Collective Wisdom Through Post-Mortems</h1>
<p>It’s easy to pat yourself on the back after a successful launch and to glow in the praise, but you actually learn more from dissecting what went wrong on a failed project. Set aside your ego and talk about it.</p>

<h1 id="build-a-great-engineering-culture">Build a Great Engineering Culture</h1>
<p>People want to work at places with good culture. Here are what the best of the best are looking for in a company:</p>
<ol>
  <li>High iteration speed companies. Engineers don’t want to wait a month to push one feature</li>
  <li>High value for automation</li>
  <li>Great abstractions for complicated services</li>
  <li>High code quality</li>
  <li>Respectful work environment</li>
  <li>Shared ownership of code</li>
  <li>Automated testing and solid QA</li>
  <li>Allow for expirimentation time (hackathons or 20% time)</li>
  <li>Foster a high learning environment</li>
  <li>Tough hiring standards. People will want to work with others they know got through the gauntlet.</li>
</ol>]]></content><author><name></name></author><category term="miscl" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Recommender Systems Overview</title><link href="https://nickgreenquist.github.io//blog/datascience/2018/04/21/recommender-systems-overview.html" rel="alternate" type="text/html" title="Recommender Systems Overview" /><published>2018-04-21T14:05:14+00:00</published><updated>2018-04-21T14:05:14+00:00</updated><id>https://nickgreenquist.github.io//blog/datascience/2018/04/21/recommender-systems-overview</id><content type="html" xml:base="https://nickgreenquist.github.io//blog/datascience/2018/04/21/recommender-systems-overview.html"><![CDATA[<style type="text/css">
    .center-image
    {
        margin: 0 auto;
        display: block;
    }
</style>

<p><em>Below is a presentation I did for my Big Data Science class on Recommender Systems</em></p>

<object data="https://nickgreenquist.github.io/blog/assets/RecommenderSystem/RecommenderSystems.pdf" type="application/pdf" width="700px" height="700px">
    <embed src="https://nickgreenquist.github.io/blog/assets/RecommenderSystem/RecommenderSystems.pdf" />
        This browser does not support PDFs. Please download the PDF to view it: <a href="https://nickgreenquist.github.io/blog/assets/RecommenderSystem/RecommenderSystems.pdf">Download PDF</a>.&lt;/p&gt;
    &lt;/embed&gt;
</object>]]></content><author><name></name></author><category term="Datascience" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Learning How to Learn</title><link href="https://nickgreenquist.github.io//blog/miscl/2018/04/10/Learning-How-to-Learn.html" rel="alternate" type="text/html" title="Learning How to Learn" /><published>2018-04-10T14:05:14+00:00</published><updated>2018-04-10T14:05:14+00:00</updated><id>https://nickgreenquist.github.io//blog/miscl/2018/04/10/Learning-How-to-Learn</id><content type="html" xml:base="https://nickgreenquist.github.io//blog/miscl/2018/04/10/Learning-How-to-Learn.html"><![CDATA[<style type="text/css">
    .center-image
    {
        margin: 0 auto;
        display: block;
    }
</style>

<p><em>An Overview of <a href="https://www.coursera.org/learn/learning-how-to-learn/home/welcome">Coursera’s Course on Learning</a></em></p>

<h2 id="what-is-learning">What is Learning</h2>

<h1 id="focused-versus-diffuse-thinking">Focused versus Diffuse Thinking</h1>

<p><img src="https://nickgreenquist.github.io//blog/assets/Learning/types.jpg" alt="Types" width="500px" class="center-image" /></p>

<p>Focused thinking is when we actively concentrate on the matter at hand. We are in an attentive state of mind and we often focus on a small set of information at at time. Diffuse thinking is when we let our mind wander freely. It is the type of thinking you engage in when you are ‘daydreaming.’ Both of these types of learning are important.</p>

<p>One way to ensure you enter diffuse mode from time to time is to take breaks during focused learning. A technique to accomplish this is the Pomodoro Technique. Basically you set a timer (for like 25 minutes) and perform focused learning during that time. After the 25 minutes is up, take a 5 minute break and allow your mind to wander.</p>

<p><img src="https://nickgreenquist.github.io//blog/assets/Learning/foundation.png" alt="Foundation" width="500px" class="center-image" /></p>

<p>One of the biggest traps we can fall into is trying to learn everything in a short amount of time. This is like a weight lifter trying to train his muscles for a competition the day before. It’s not going to happen. Your brain is like your muscles: it needs time to build itself up and make new connections. Approach learning like you do exercising: consistent work everyday is the only way to build a solid foundation.</p>

<h1 id="procrastination-memory-and-sleep">Procrastination, Memory, and Sleep</h1>

<p><img src="https://nickgreenquist.github.io//blog/assets/Learning/procrastination.jpg" alt="Procrastination" width="350px" class="center-image" /></p>

<p>Everyone suffers from some level of procrastination. One way to combat this is to use the Pomodoro Technique. Using the timer to ensure you actually have blocks of focused learning can help tremendously in combating procrastination.</p>

<p>To stave off procrastination, you should also minimize distractions. Find a quiet place to study and turn off your phone. Also, keep in mind that procrastination is often caused more from the pain of thinking about the thing you have to complete. Often, the process of getting things done is not as scary. If you can get yourself to just start the process, you’ll push way that pain of thinking about what you still have to do.</p>

<p><img src="https://nickgreenquist.github.io//blog/assets/Learning/memory.png" alt="Memory" width="400px" class="center-image" /></p>

<p>Your memory is made up of short term and long term memory. Long term memory is like a warehouse where you can store millions of things. However, getting things back can be hard. Short term memory is like a blackboard that is always slowly fading away. In order to store things in long term memory, you need to practice, practice, and practice what you are trying to learn. Just writing it on the blackboard will not make it ‘stick.’</p>

<p>Sleep is critical to learning. We still don’t know what exactly sleep does for us, but we know exactly what lack of sleep can do. Lack of sleep literally builds up toxins in your body and many of these can damage your brain.</p>

<h2 id="chunking">Chunking</h2>

<h1 id="chunking---the-essentials">Chunking - The Essentials</h1>

<p><img src="https://nickgreenquist.github.io//blog/assets/Learning/chunking.jpg" alt="Chunking" width="700px" class="center-image" /></p>

<p>Chunking is when you break up what you want to learn into smaller pieces. You can think of this as an analogy to a puzzle. The concept you are trying to learn is the complete puzzle. To solve it, you have to learn all the individual pieces and then put them together in the right way. The part of finding the individual pieces is chunking.</p>

<p>As you learn more and more chunks, you can make each chunk bigger and bigger. Chunks can also ‘transfer.’ What this means is that learning chunks in one area can actually help learn other things.</p>

<h1 id="seeing-the-bigger-picture">Seeing the Bigger Picture</h1>

<p><img src="https://nickgreenquist.github.io//blog/assets/Learning/bigpicture.jpg" alt="Big Picture" width="350px" class="center-image" /></p>

<p>Often times it is better to understand the big picture before starting to chunk the concept. Have you ever been in a math class where you just start learning equations and random relationships while having no idea what the whole point is? If you have, you’ve been victim to trying to solve a puzzle without even know what the complete picture is supposed to be.</p>

<p>The Illusion of Competence is when you trick yourself into thinking you know a subject. Some people will read and reread the material and do simple problems over and over again, thinking they are mastering the subject. Often times, they are kidding themselves. To really know a subject, you can do multiple things. Test yourself with difficult question. Also, test yourself across multiple sections rather than just sticking in one subject at a time. This is called interleaving and is shown to improve learning.</p>

<p>Einstellung is when you have become stuck in one way of thinking. It’s caused by a neural pattern that is ingrained in your brain. It’s hard to think about something in a different way because of this.</p>

<h2 id="procrastination-and-memory">Procrastination and Memory</h2>

<h1 id="procrastination">Procrastination</h1>

<p><img src="https://nickgreenquist.github.io//blog/assets/Learning/yolo.png" alt="Lifting Once" width="350px" class="center-image" /></p>

<p>Remember, learning is like lifting weights: you can’t do it all in one day. Just like building muscle mass, changing the structure of your brain with new knowledge is a slow and difficult journey. Because it takes so long for real learning, you can’t save it all for the last minute. To help with procrastination, keep a journal. Set goals and write down tasks. This can add a sense of reward to knocking things off the lists. Also, do the hardest things first in the day. Whatever makes you feel most uncomfortable thinking about it, do that first.</p>

<h1 id="memory">Memory</h1>

<p><img src="https://nickgreenquist.github.io//blog/assets/Learning/middleout.jpg" alt="Middle Out" width="350px" class="center-image" /></p>

<p>Because it takes time and effort to move things into the long term memory warehouse, you should always start learning early. In order to do this, you should learn how to manage your procrastination.</p>

<p>As you get better and better in a subject, your short term memory becomes more efficient. This is because you can store ideas and new things in smaller chunks on the blackboard. You can think of this like compression. You are able to save the same amount of memory in smaller blocks.</p>

<p>One tool to help with memorization is to group things together with meaningful connections and also to use analogy and metaphor. Remember, even the most complex models in math and science are just glorified metaphors of the universe.</p>

<h2 id="renaissance-learning-and-unlocking-your-potential">Renaissance Learning and Unlocking Your Potential</h2>

<h1 id="summary">Summary</h1>

<p>Metaphors and analogies aren’t just for art and literature. One of the best things you can do to not only remember, but more easily understand concepts in many different fields, is to create a metaphor or analogy for them. Often, the more visual, the better.</p>

<p>Try to avoid ‘genius envy’. Sure, intelligence is real and some people are smarter than others, but that doesn’t paint the whole picture. You don’t need to be a genius to work hard at mastering how to learn. And the person who spend the hard work learning things will always beat the smart person who does not apply themselves to anything.</p>

<p>Many people still suffer from imposter syndrome. This is when you feel inadequate and think everyone around you is so much smarter and you just haven’t been found out yet. The truth is almost everyone feels this way at times. That’s why the term ‘fake it till you make it’ is often good advice because almost no one is 100% confident in themselves.</p>

<p><img src="https://nickgreenquist.github.io//blog/assets/Learning/studying.png" alt="Studying" width="350px" class="center-image" /></p>

<p>When preparing for tests, study in groups. Studying with others helps you see the material in a different way, get asked questions you wouldn’t have thought of yourself, and also can help you learn even more by ‘teaching’ the material to others.</p>

<h2 id="conclusion">Conclusion</h2>

<p>If you haven’t already, take <a href="https://www.coursera.org/learn/learning-how-to-learn/home/welcome">Coursera’s Course on Learning</a>. You’d be surprised how many little things people would think obvious or too simplistic can actually make a difference in your learning. Why would you not want to master a skill that will make all future skill learning easier?</p>]]></content><author><name></name></author><category term="miscl" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Recommender System with Rapid Miner</title><link href="https://nickgreenquist.github.io//blog/datascience/2018/03/02/recommender-system-rapid-miner.html" rel="alternate" type="text/html" title="Recommender System with Rapid Miner" /><published>2018-03-02T23:05:14+00:00</published><updated>2018-03-02T23:05:14+00:00</updated><id>https://nickgreenquist.github.io//blog/datascience/2018/03/02/recommender-system-rapid-miner</id><content type="html" xml:base="https://nickgreenquist.github.io//blog/datascience/2018/03/02/recommender-system-rapid-miner.html"><![CDATA[<style type="text/css">
    .center-image
    {
        margin: 0 auto;
        display: block;
    }
</style>

<p><strong>User k-NN Collaborative Filtering for Item Recommendations - A step by step guide in Rapid Miner</strong></p>

<p><img src="https://nickgreenquist.github.io//blog/assets/RecommenderSystem/rapidminer.png" alt="RapidMiner" width="700px" class="center-image" /></p>

<p>As part of a class for NYU, a team of 3 of us are building a recommendation system for books. To quickly prototype a dead simple recommender system, we put together a simple Rapid Miner workflow. You can read more about this <a href="https://dorukkilitcioglu.github.io/data-science/2018/03/01/adventures-rapidminer.html">here at Doruk Kilitcioglu’s blog</a>. Below are is the step by step guide we used to get results from Rapid Miner for item recommendations using user-user collaborative filtering.</p>

<ol>
  <li>
    <p><strong>Download <code class="language-plaintext highlighter-rouge">ratings.csv</code> from <a href="http://fastml.com/goodbooks-10k-a-new-dataset-for-book-recommendations/">http://fastml.com/goodbooks-10k-a-new-dataset-for-book-recommendations/</a></strong></p>

    <ol>
      <li>
        <p><strong>NOTE: If you don’t have an educational license with RapidMiner, you can only load in 10k rows. Open and edit the ratings file and trim it down to 10k rows.</strong></p>
      </li>
      <li>
        <p>You can get an education license from the RapidMiner website if you make an account and add an .edu email</p>
      </li>
    </ol>
  </li>
  <li>
    <p><strong>Download RapidMiner and install to your machine</strong></p>
  </li>
  <li>
    <p><strong>Start a New Process and make it Blank</strong></p>
  </li>
  <li>
    <p><strong>Loading the Data</strong></p>

    <ol>
      <li>
        <p>Hit <code class="language-plaintext highlighter-rouge">Add Data</code> at the top left under repository</p>
      </li>
      <li>
        <p>Click on My Computer and find ratings.csv from your local machine</p>
      </li>
      <li>
        <p>Hit all the <code class="language-plaintext highlighter-rouge">Next</code> buttons and then save the file under <code class="language-plaintext highlighter-rouge">data</code></p>
      </li>
      <li>
        <p>This might take up to a minute</p>
      </li>
      <li>
        <p>From the top left, expand Local Repository, then data, and then drag ratings.csv to the right window</p>
      </li>
    </ol>
  </li>
  <li>
    <p><strong>6million ratings is too much for RapidMiner to process so let’s filter it down</strong></p>

    <ol>
      <li>
        <p>Find the <code class="language-plaintext highlighter-rouge">Filter Examples</code> operator and drag to the right window</p>
      </li>
      <li>
        <p>Hook up the output of Retrieve ratings to the input of <code class="language-plaintext highlighter-rouge">Filter Examples</code></p>
      </li>
      <li>
        <p>Click on <code class="language-plaintext highlighter-rouge">Filter Examples</code> and click on the Add Filters button to the far right</p>
      </li>
      <li>
        <p>Ensure user_id is selected as the left field</p>
      </li>
      <li>
        <p>Make the filter operator (should be <code class="language-plaintext highlighter-rouge">=</code> by default) a <code class="language-plaintext highlighter-rouge">&lt;</code></p>
      </li>
      <li>
        <p>Type in anywhere between <code class="language-plaintext highlighter-rouge">500</code> to <code class="language-plaintext highlighter-rouge">1000</code></p>
      </li>
      <li>
        <p>Hit <code class="language-plaintext highlighter-rouge">OK</code></p>
      </li>
    </ol>

    <p><img src="https://nickgreenquist.github.io//blog/assets/RecommenderSystem/filter.png" alt="RapidMiner" width="700px" class="center-image" /></p>
  </li>
  <li>
    <p><strong>Set the role of the columns</strong></p>

    <ol>
      <li>
        <p>Add the <code class="language-plaintext highlighter-rouge">Set Role</code> operator to the window</p>
      </li>
      <li>
        <p>Click on the box</p>
      </li>
      <li>
        <p>At the far right, from the <code class="language-plaintext highlighter-rouge">attribute name</code> drop down, select <code class="language-plaintext highlighter-rouge">rating</code> and set the target role to <code class="language-plaintext highlighter-rouge">label</code></p>
      </li>
    </ol>

    <p><img src="https://nickgreenquist.github.io//blog/assets/RecommenderSystem/label.png" alt="RapidMiner" width="300px" class="center-image" /></p>

    <ol>
      <li>
        <p>Click on the <code class="language-plaintext highlighter-rouge">Edit List</code> button</p>

        <ol>
          <li>
            <p>Make the left field <code class="language-plaintext highlighter-rouge">user_id</code> and at the right field, <strong>TYPE</strong> in <code class="language-plaintext highlighter-rouge">user identification</code></p>
          </li>
          <li>
            <p>At the bottom hit <code class="language-plaintext highlighter-rouge">Add Entry</code></p>
          </li>
          <li>
            <p>Made the new left field <code class="language-plaintext highlighter-rouge">book_id</code> and at the right field, <strong>TYPE</strong> in <code class="language-plaintext highlighter-rouge">item identification</code></p>
          </li>
        </ol>
      </li>
      <li>
        <p>Hit <code class="language-plaintext highlighter-rouge">Apply</code></p>
      </li>
    </ol>

    <p><img src="https://nickgreenquist.github.io//blog/assets/RecommenderSystem/roles.png" alt="RapidMiner" width="700px" class="center-image" /></p>
  </li>
  <li>
    <p><strong>Split data into <code class="language-plaintext highlighter-rouge">train</code> and <code class="language-plaintext highlighter-rouge">test</code></strong></p>

    <ol>
      <li>
        <p>Add the <code class="language-plaintext highlighter-rouge">Split Data</code> operator to the right window</p>
      </li>
      <li>
        <p>Hook up output of <code class="language-plaintext highlighter-rouge">Set Role</code> to <code class="language-plaintext highlighter-rouge">Split Data</code></p>
      </li>
      <li>
        <p>Click on <code class="language-plaintext highlighter-rouge">Split Data</code> and hit the <code class="language-plaintext highlighter-rouge">Edit Enumeration</code> bottom in the top right</p>
      </li>
      <li>
        <p>Add two entries</p>
      </li>
      <li>
        <p>Type in the first one as .8 (this is the train set)</p>
      </li>
      <li>
        <p>Type in the second one as .2 (this is the test set)</p>
      </li>
      <li>
        <p>Hit <code class="language-plaintext highlighter-rouge">OK</code></p>
      </li>
    </ol>

    <p><img src="https://nickgreenquist.github.io//blog/assets/RecommenderSystem/split.png" alt="RapidMiner" width="700px" class="center-image" /></p>
  </li>
  <li>
    <p><strong>Add Recommender System algorithm</strong></p>

    <ol>
      <li>
        <p>At the very top right, hit <code class="language-plaintext highlighter-rouge">Extensions</code> and go to the Marketplace</p>
      </li>
      <li>
        <p>Type <code class="language-plaintext highlighter-rouge">Recommender</code> in the search bar</p>
      </li>
      <li>
        <p>Install <code class="language-plaintext highlighter-rouge">Recommender Extension</code> and follow the instructions to install</p>
      </li>
    </ol>
  </li>
  <li>
    <p><strong>Add User k-NN item recommender system</strong></p>

    <ol>
      <li>
        <p>Find the <code class="language-plaintext highlighter-rouge">Collaborative Filtering Item Recommendation/ User k-NN operator</code> (will be in <code class="language-plaintext highlighter-rouge">Extensions</code> under <code class="language-plaintext highlighter-rouge">Recommenders/Item Recommendation</code>)</p>
      </li>
      <li>
        <p>Drag this to the right window</p>
      </li>
      <li>
        <p>Hook up the top output of the <code class="language-plaintext highlighter-rouge">Split Data</code> box to the input of the <code class="language-plaintext highlighter-rouge">User k-NN</code> box</p>
      </li>
    </ol>
  </li>
  <li>
    <p><strong>Apply the model to train and test</strong></p>

    <ol>
      <li>
        <p>Add <code class="language-plaintext highlighter-rouge">Apply Model (Item Recommendation)</code> operator to the right window</p>
      </li>
      <li>
        <p>Hook up the <code class="language-plaintext highlighter-rouge">Mod</code> output of the User k-NN to the input <code class="language-plaintext highlighter-rouge">Mod</code> of the <code class="language-plaintext highlighter-rouge">Apply Model</code> box</p>
      </li>
      <li>
        <p>Hook up the second <code class="language-plaintext highlighter-rouge">par</code> output of the <code class="language-plaintext highlighter-rouge">Split Data</code> box to the <code class="language-plaintext highlighter-rouge">que</code> input of the <code class="language-plaintext highlighter-rouge">Apply Model</code> box</p>
      </li>
      <li>
        <p>Drag the <code class="language-plaintext highlighter-rouge">res</code> output of the <code class="language-plaintext highlighter-rouge">Apply Model</code> box to the <code class="language-plaintext highlighter-rouge">res</code> on the very far right of the window (the final output)</p>
      </li>
    </ol>
  </li>
  <li>
    <p><strong>Hit the big blue <code class="language-plaintext highlighter-rouge">Run</code> button to view the output!</strong></p>

    <ol>
      <li>The model will recommend items (books) to users based on the books other users very similar to them have read</li>
    </ol>
  </li>
  <li>
    <p><strong>Please view the image below if you are stuck</strong></p>

    <p><img src="https://nickgreenquist.github.io//blog/assets/RecommenderSystem/ItemRecommendationsApply.png" alt="Item Recommendation Apply" width="700px" class="center-image" /></p>
  </li>
  <li>
    <p><strong>View performance metrics</strong></p>

    <ol>
      <li>
        <p>Delete the <code class="language-plaintext highlighter-rouge">Apply Model</code> box</p>
      </li>
      <li>
        <p>Add the <code class="language-plaintext highlighter-rouge">Performance (Item Recommendation)</code> operator</p>
      </li>
      <li>
        <p>Hook up the <code class="language-plaintext highlighter-rouge">Mod</code> output of the <code class="language-plaintext highlighter-rouge">User k-NN</code> box to the <code class="language-plaintext highlighter-rouge">Mod</code> input of the <code class="language-plaintext highlighter-rouge">Performance</code> box</p>
      </li>
      <li>
        <p>Hook up the <code class="language-plaintext highlighter-rouge">exa</code> output of the <code class="language-plaintext highlighter-rouge">User k-NN</code> box to the <code class="language-plaintext highlighter-rouge">tra</code> input of the <code class="language-plaintext highlighter-rouge">Performance</code> box</p>
      </li>
      <li>
        <p>Hook up the second <code class="language-plaintext highlighter-rouge">par</code> output of the Split Data box to the <code class="language-plaintext highlighter-rouge">tes</code> input of the Performance box</p>
      </li>
      <li>
        <p>Hoop up the <code class="language-plaintext highlighter-rouge">per</code> output of the Performance box to the <code class="language-plaintext highlighter-rouge">res</code> at the very far right of the window</p>
      </li>
    </ol>
  </li>
  <li>
    <p><strong>Hit the big blue <code class="language-plaintext highlighter-rouge">Run</code> button to view the output!</strong></p>

    <ol>
      <li>
        <p>The output will be a slew of performance metrics for the Item Recommendations</p>
      </li>
      <li>
        <p>The AUC (Area Under of the Curve) can be treated as an <code class="language-plaintext highlighter-rouge">accuracy</code> metric</p>
      </li>
    </ol>
  </li>
  <li>
    <p><strong>Please view the image below if you are stuck</strong></p>
  </li>
</ol>

<p><img src="https://nickgreenquist.github.io//blog/assets/RecommenderSystem/ItemRecommendationsPerformance.png" alt="Item Recommendation Performance" width="700px" class="center-image" /></p>

<p><strong>Notes and further exploration:</strong></p>

<ol>
  <li>
    <p>You can use this set up on any set of ratings as long as the input csv follows the following format (User_id, item_id, rating) and you make sure to set the roles to exactly <code class="language-plaintext highlighter-rouge">user identification</code>, <code class="language-plaintext highlighter-rouge">item identification</code> and <code class="language-plaintext highlighter-rouge">label</code> as explained in the steps above</p>
  </li>
  <li>
    <p>You can predict the ratings on the test set instead of predicting good recommendations. Swap out the <code class="language-plaintext highlighter-rouge">Item Recommendation User k-NN</code> with <code class="language-plaintext highlighter-rouge">Rating Prediction User k-NN</code> if you would rather predict the ratings that users have given their books</p>
  </li>
  <li>
    <p>Play around with Item k-NN or other operators. These operators find items that are most similar to other items in order to make recommendations. What we used above found most similar users to other users in order to recommend items</p>

    <ol>
      <li>Please read more on recommender systems and techniques to make them. This post is meant to be a step by step guide for RapidMiner and not an explanation on recommender systems</li>
    </ol>
  </li>
</ol>]]></content><author><name></name></author><category term="Datascience" /><summary type="html"><![CDATA[]]></summary></entry></feed>