Using Ordinary Differential Equations To Design State of the Art Residual-Style Layers

The current state of the art in image classification is thanks to residual layers. Introduced by Microsoft Research, the residual layer adds the output of the activation function to the input of the layer. This seemingly minor change has led to a rethinking of how neural network layers are designed. In this post, we will discuss the mathematics behind why residual networks work so well and explore other additions one can make to acheive near SOTA performance with a small memory footprint or even create arbitrarily deep neural networks.

Read More

Learning About Deep Reinforcement Learning (Slides)

Earlier this month, I gave an introductory talk at Data Philly on deep reinforcement learning. The talk followed the Nature paper on teaching neural networks to play Atari games by Google DeepMind and was intended as a crash course on deep reinforcement learning for the uninitiated. Get the slides below!

Read More

Understanding Attention in Neural Networks Mathematically

Attention has gotten plenty of attention lately, after yielding state of the art results in multiple fields of research. From image captioning and language translation to interactive question answering, Attention has quickly become a key tool to which researchers must attend. Some have taken notice and even postulate that attention is all you need. But what is Attention anyway? Should you pay attention to Attention? Attention enables the model to focus in on important pieces of the feature space. In this post, we explain how the Attention mechanism works mathematically and then implement the equations using Keras. We conclude with discussing how to “see” the Attention mechanism at work by identifying important words for a classification task.

Read More

Adversarial Dreaming with TensorFlow and Keras

Everyone has heard the feats of Google’s “dreaming” neural network. Today, we’re going to define a special loss function so that we can dream adversarially– that is, we will dream in a way that will fool the InceptionV3 image classifier to classify an image of a dreamy cat as a coffeepot.

Read More

Hogwild!? Implementing Async SGD in Python

Hogwild! is asynchronous stochastic gradient descent algorithm. The Hogwild! approach utilizes “lock-free” gradient updates. For a machine learning model, this means that the weights of a model are updated by multiple processes at the same time with the possibility of overwriting each other. In this post, we will use the multiprocessing library to implement Hogwild! in Python for training a linear regression model.

Read More