In the original CUPED paper, the authors mention that it is straightforward to generalize the method to multiple covariates. However, without understanding exactly the mathematical technique to find the CUPED estimate, it may be confusing to attempt the multiple covariates case. In this post, we will explain the thought process behind the CUPED estimate and demonstrate the analytic formula for a multiple covariate extension to CUPED. We then discuss the estimate for non-user level metrics, and we will need to use the delta method for the variance. In this case, the book keeping when using the delta method would be tedious unless you use a simplified calculation which we empirically demonstrate in the second section of the post.
In the past decade, there have been many advancements to enable A/B testing at (sometimes even small!) scale. Thankfully, many A/B testing practitioners have come forward and written books and papers on the finer details. Even with the deluge blog posts on the subject– there’s one topic that does not get a lot of attention, and specifically a lot of code related to it: the connection between OLS, the delta method, and how those techniques connect to CUPED. Understanding their connection is critical when trying to apply more advanced techniques like CUPED (said “Cupid”) with session level metrics.
Using clustering algorithms on Fantasy Football players is a popular technique. There was even a piece in the New York Times using Gaussian Mixture Models to collect players into so-called “tiers”. One drawback is that the approach relies on (average) point projections but ignores external factors to the game that are harder to measure and/or model. In this post, we explore a modification to the GMM approach to take into account these immeasurables through expert rankings.
The current state of the art in image classification is thanks to residual layers. Introduced by Microsoft Research, the residual layer adds the output of the activation function to the input of the layer. This seemingly minor change has led to a rethinking of how neural network layers are designed. In this post, we will discuss the mathematics behind why residual networks work so well and explore other additions one can make to acheive near SOTA performance with a small memory footprint or even create arbitrarily deep neural networks.
Earlier this month, I gave an introductory talk at Data Philly on deep reinforcement learning. The talk followed the Nature paper on teaching neural networks to play Atari games by Google DeepMind and was intended as a crash course on deep reinforcement learning for the uninitiated. Get the slides below!