Learning RecSys through Papers Vol II- The How, What, and Why of In-Batch Negatives

Consider the retrieval problem in a recommendation system. One way to model the problem is to create a big classification model predicting the next item a user will click or watch. This basic approach could be extended to large catalogs via a sampled softmax loss, as discussed in our last post. Naturally, the resulting dataset will only contain positive examples (i.e., what was clicked). In this post, we explore a more optimized version of this approach for more complex models via in-batch negatives. We derive the LogQ Correction to improve the estimate of the gradient and present code in PyTorch to implement the method, building on the previous blog post.

Read More

Learning RecSys through Papers- Implementing a Candidate Generation Model

Sometimes I’m asked how to start learning about recommender systems, and there’s a few go-to papers I always mention; however, without a proper map, it could be alittle difficult for the uninitiated. So, to try to make a gentle introduction, I will walk through an implementation of the candidate generation model based on Deep Neural Networks for YouTube, which I will sometimes refer to as the “Deep Nets” paper. This paper (and its talk) is jam-packed with all sorts of interesting practical recommendation system knowledge and is the perfect starting place for people hoping to understand large-scale recommendation systems. I will implement the key components of the candidate generation model and train the model on the MovieLens dataset in PyTorch, a typical benchmark dataset in the RecSys space. There will also be a few modern flourishes and comments on new developments as applicable. I’ll conclude with outlining a natural extension to this approach to predict multiple outcomes.

Read More

Calculating Statistical Power When Your Analysis Requires the Delta Method

In this post, we explore a situation typical of a website, where a user is allowed to view the page multiple times and may click on a button of interest on any visit. We would like to perform an A/B test on the non-user level metric, the click-through rate (CTR), defined via total clicks divided by total page views. In order to do the final analysis, we would need to estimate the variance for the variable via the delta method. However, what does that mean for a priori statistical power calculations?

Read More

CUPED with Multiple Covariates and A Simpler the Delta Method Calculation

In the original CUPED paper, the authors mention that it is straightforward to generalize the method to multiple covariates. However, without understanding exactly the mathematical technique to find the CUPED estimate, it may be confusing to attempt the multiple covariates case. In this post, we will explain the thought process behind the CUPED estimate and demonstrate the analytic formula for a multiple covariate extension to CUPED. We then discuss the estimate for non-user level metrics, and we will need to use the delta method for the variance. In this case, the book keeping when using the delta method would be tedious unless you use a simplified calculation which we empirically demonstrate in the second section of the post.

Read More

Connections Between the Delta Method, OLS and CUPED, Illustrated

In the past decade, there have been many advancements to enable A/B testing at (sometimes even small!) scale. Thankfully, many A/B testing practitioners have come forward and written books and papers on the finer details. Even with the deluge blog posts on the subject– there’s one topic that does not get a lot of attention, and specifically a lot of code related to it: the connection between OLS, the delta method, and how those techniques connect to CUPED. Understanding their connection is critical when trying to apply more advanced techniques like CUPED (said “Cupid”) with session level metrics.

Read More