A Tour of Gotchas When Implementing Deep Q Networks with Keras and OpenAi Gym

Starting with the Google DeepMind paper, there has been a lot of new attention around training models to play video games. You, the data scientist/engineer/enthusiast, may not work in reinforcement learning but probably are interested in teaching neural networks to play video games. Who isn’t? With that in mind, here’s a list of nuances that should jumpstart your own implementation.

Read More

Dealing with Trends. Combine a Random Walk with a Tree-Based Model to Predict Time Series Data

A standard assumption underlying a standard machine learning model is that the model will be used on the same population during training and testing (and production). This poses an interesting issue with time series data, as the underlying process could change over time which would cause the production population to look differently from the original training data. In this post, we will explore the concept of a data model, a theoretical tool from statistics, and in particular the idea of a random walk to handle this situation, improving your modeling toolkit.

Read More

(My Opinion of) Best Practices for a Data Scientist in Industry

Most things you need to know when you start as a data science can be found online these days. Besides containing my own opinions, this post also aims to detail some important points that are either not always spelled out in detail or are glossed over. I also include practical advice to avoid typical mistakes a new data scientist can make. Of course, it’s going to be tailored more towards python users (as I am one too), but that doesn’t mean there’s nothing there for other users.

Read More

Leveraging Factorization Machines for Wide Sparse Data and Supervised Visualization

Taking cues from other disciplines is a great way to expand your data science toolbox. Factorization machines (and matrix factorization methods more generally) are particularly successful models for recommendation systems which have led to high scoring results in the Netflix challenge and many Kaggle competitions. In this post, we will apply feature hashing combined with Factorization Machines to a static text classification problem to excellent effect. We’ll go through an example and explain the mathematical background of Factorization Machines in an easy-to-understand, simplified fashion. Finally, we will demonstrate how to use factorization machines for supervised visualization in the same vein as latent semantic analysis.

Read More