One of the more interesting mental models of machine learning I’ve come to understand in the last month or so, is the “five tribes of artificial intelligence” model popularized in “The Master Algorithm” by Pedro Domingos. To summarize in a phrase, the master algorithm is that approach which can uncover all possible insight from data – and Prof. Domingos hypothesises that there are five distinct such “master algorithms”, one for each of these tribes. One of these “tribes” is the connectionists, whose master algorithm is, in fact, backpropagation, which is central to the design and operation of neural networks.
A Connectionist Tour Guide
In a sense, the deep neural network has become synonymous with artificial intelligence today. There are numerous other algorithms which could lend a sense of intelligence to machines – whether by communicating in natural language as a conversationalist (starting from rudimentary bots like ELIZA through Pootwattle and Smedley (of U Chicago fame), to modern chatbots), or by learning to differentiate different kinds of faces, or identify emotions of specific kinds. The deep neural network has successfully been applied to numerous such real world problems, and therefore stands out as being promising on this account. For the other tribes, we don’t yet have algorithms such as “advanced induction inference machines”, or “higher dimensional kernel machines” – whatever these may indicate (really or apocryphally). So it behooves us to pay attention to stories such as this one, which discuss the “unreasonable effectiveness” of neural networks.
Perhaps the fact that a key AI researcher of our time has taken time off from self-driving cars to create a course like this, is telling!
— R Explorations (@rexplorations) August 13, 2017
There’s definitely a skills gap in the advanced machine learning and artificial intelligence space. Businesses are as yet unable to see value beyond the hype. Unsurprisingly, the skills gap has to be addressed at the very root – the fundamentals, where the ability to model problems, computationally solve them, and build systems out of such solutions intersect. Andrew Ng has, also unsurprisingly, taken a stab at the deep learning space, if his “AI is the new electricity” talk is anything to go by.
Over the last few weeks, I’ve had the opportunity to spend some time on Andrew Ng’s Deep Learning course from DeepLearning.ai. For me, this is like a tour guide to the world of the connectionists. The reality is that neural networks don’t work like the human brain apart from superficial similarities – as Ng himself explains in the course – but the term has stuck, since the motivations of early pioneers who also knew some neuroscience led to the moniker.
The Coursera certification is organized into five different courses, and the first of these lays the mathematical and programmatic foundation for implementing them. This first course, titled Neural Networks and Deep Learning has well-orchestrated exercises within Coursera’s integrated Jupyter notebook interface, and you can use the algorithm on your own data, to evaluate its performance. I’m currently some way through the second course, having finished the first one – and I have to say that the videos, programming exercises and other course aspects create a true learning feedback loop, which is effective in teaching the basics really well. I’m very impressed with the way the course has been put together and made accessible to those with a little bit of machine learning knowledge, who are starting out on neural networks and deep learning.
In the below section, I’ll outline my key learnings from the first course in the certification. I hope that you take the course, if you are a ML and AI enthusiast or young professional (or even an experienced one) interested in working on deep learning.
- The course introduced the most fundamental ideas of neural networks at the very start, with extensive coverage on how to implement a logistic regression model for classifying data. This intial discussion was built up rather nicely into a discussion on deep learning.
- As an intermediate course, it assumes some amount of knowledge of linear algebra and differential equations. As someone who works with machine learning models, I was able to grasp the intuitions with one repetition. If it has been a while since you worked through linear algebra and differential calculus (or thought through equations, at the very least), expect to take a while to find your feet.
- Some of the intuitions around gradient descent, the values of derivatives, and so on, were introduced very handily – and were reinforced through the exercises.
- The importance of vectorization and its central use in numpy (which is used extensively – nay, almost exclusively – throughout the course) was well brought out. Numpy is a powerful library and surprisingly, received its first funding only in 2017 after being useful for the development of numerous algorithms and tools. Some of its quirks, such as order (n,) vectors, were especially interesting and useful to learn about. Overall though this isn’t a numpy tutorial by any stretch, it is referenced extensively.
- During weeks 2 and 3, the logistic regression algorithm is taught in a different context – it is likened to neurons in a deep net, and the details of activation functions are discussed. This, to me, was the meat of the course.
- In weeks 2 and 3, a consistent methodology and notation was followed for the discussion of and the implementation of forward and backward propagation, two of the key mechanisms in any neural network – and this was done entirely within numpy, and these are great hands-on lessons. Stochastic gradient descent was also explained and implemented.
- Finally, in week 4, deep neural networks were handled, and parametrization of the neural network topology was introduced. Ideas related to this, such as hyperparameter optimization were also discussed. Additionally, in both videos and assignments, Andrew Ng provided practical advice on how to get the matrix dimensions right for weight and bias vectors – without this and the consistent notation, a lot of the programming implementations of DNNs could potentially get very hairy, so I personally felt that this was very well handled.
- A cat classifier deep neural network in Week 4 – because who doesn’t like cats?
- Right through the course, there are optional video lectures, and interviews with well known researchers. One of them is with Geoff Hinton, and it was definitely instructive.
I’m about half-way through the second course, on Improving Deep Neural Networks, and my experience there has been similar to the first course. The content derives directly from the content of the first course, and therefore, going in sequence from the first to the second definitely has its advantages. If you were to start the second course of the specialization first, expect to spend some time to find your feet. So far, I only wish there had been better explanations of ideas like dropout and L2 regularization, especially given the tricky quizzes in Week 1. This is a 3-week course, and I wish an additional week, or a few more videos had been spent initially, explaining and firming up ideas around regularization. Additionally, the exploding/vanishing gradient problems could be better illustrated with videos and so on, although I felt the course generally does a good job of explaining the essentials of these ideas.
To conclude, I’d recommend this certificate for those in the analytics, data science or machine learning space, who are a bit hands on, can grasp linear algebra and calculus, and can work with Python. You’ll find that since this is an “intermediate” specialization, neophytes will require multiple viewings of the videos to become conversant in the ideas and concepts. This still shouldn’t deter those who want to audit the course or learn the concepts therein for a deeper understanding to back up their direct experience in machine learning.