Deep Learning
Roadmap to Deep Learning for Visual Recognition
- Before jumping to learning-based techniques for vision, it is recommended to go through some of the fundamental concepts of Computer Vision. From our experience, we found Introduction to Computer Vision is a good starting point, and it covers some concepts like Image as function, filtering, convolution, etc. You can do this course upto Lesson 4: Linearity and Convolution before moving to DL. This builds a strong mathematical foundation about images and helps you to easily grasp DL later. You will understand about CNNs much easily after doing this course
- Once you get a hang on computer vision concepts, it is advisable to go through some of the linear algebra concepts as well. Some of them include eigendecomposition, single value decomposition, matrix and vector norms (and other imp concepts. Refer Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville textbook for more references)
- Python programming language will be used in most of the courses. If you dont know python then go through some tutorials like this one by Corey Schafer.
- Now, you are ready to start with the Deep Learning Specialization by Andrew Ng. The foundation built through these courses are very essential. Make proper notes during the course.
- Code your artificial neural networks and convolutional neural networks from scratch using numpy. These maybe covered within the course itself, so be sure to complete the course assignments with proper understanding.
- You should now pickup a framework to work with. Most members of IvLabs prefer PyTorch. In case you have an inclination towards industry, TensorFlow is the way to go, whereas for research, you should go for PyTorch.
- You can skip the Sequence Modelling course of Deep Learning Specialization and do it later if you are interested in Natural Language Processing (NLP) or signal processing.
- Keep yourself updated with latest reasearch papers. Reddit and Twitter are highly recommended for this purpose.
- Ask for support from seniors who have already worked on these fields.
NOTE for futher studies:
- If one’s focus is to use deep learning for object detection, segmentation (feature level understanding) etc, then understanding only Image Processing is sufficient. Digital image processing by Rafael C. Gonzalez is a standard book to refer.
- If one’s focusing on deep learning for Computer Vision/Perception application then Image Processing and complete course on Introduction to Computer Vision is requisite. Perception includes concepts of 3D perspective, Stereo, Optical flow, object tracking, visual recognition, etc. which are all very important. In simple words, Image Processing is kinematics and Computer Vision is dynamics. Book on Computer Vision: Algorithms and Applications by Richard Szeliski is very good for reference and focuses more on mathematical approaches.
- If one’s focus goes further on Visual Recognition in the direction of Image-to-Image Translation or Image Synthesis, one can learn about Generative Adversarial Networks which use networks to learn Computer Vision in order to generate new images. The course on Generative Adversarial Networks by Andrew Ng requires basic knowledge and skills of Deep Learning with PyTorch.
General Courses
In these courses, you will learn the foundations of Deep Learning, understand how to build neural networks, and learn how to undertake Deep Learning projects systematically.
Deep Learning
- Deep Learning Specialization by Andrew Ng
- Deep Learning - IIT-Madras CS7015
- Deep Learning - Stanford CS230
- Fast.ai
Machine Learning
Vision
Course Reviews
Deep Learning Specialization
The Deep Learning Specialization, presented by Andrew Ng, is a collection of five courses, each offering a good requiem of required concepts and implementations for almost all kinds of basic deep learning. The contents and goals of each of the courses are further explained below.
- Neural Networks and Deep Learning aims at explaining the basics of neural networks and deep learning, starting from very basic problems like linear and logistic regression. It helps one understand and actually implement neural networks and basic optimization algorithms from scratch, mainly in NumPy. Using an analogy, this course actually teaches you to bake from scratch including how to use an oven.
- Improving Deep Neural Networks focuses more on improving the performance of neural networks by employing efficient optimization and normalization algorithms. If the first course is a course that teaches you baking, then this one teaches you how and where to use which ingredient to get the best results. However, the last few assignments are based on TensorFlow.
- Structuring Machine Learning Projects is a course that helps one develop an intuition as to which architectures and optimization algorithms would work best for a given machine learning problem. Further, it also covers the basics of transfer learning and multi-task learning. It is a course oriented more towards industrial and result oriented practices. So, this is a course that teaches you to industrialize your bakery.
- Convolutional Neural Networks brings to the table, the most fundamental and powerful tools used in almost all kinds of deep learning problems that are coupled with image processing or even signal processing at times. The course covers all the basics and is almost a must for anyone planning to work with images and deep learning. The assignments are in NumPy and TensorFlow. So this course teaches you how to bake a cake because who doesn’t like cake.
- Sequence Models is a course that explains recurrent neural networks and other architectures and techniques used in applying deep learning on sequential data. It focuses more on Natural Language Processing and audio signal processing which is a completely different and rather nascent domain of deep learning. The assignments use NumPy. This is like learning to bake pizzas.
A healthy approach, containing a good mix of results and fairly decent understanding would be to first complete the first two courses, then study the fourth one along with a bit of Image Processing and Computer vision followed by a small project using NumPy, from scratch, followed by an introductory project like The Digit Classifier using either PyTorch or TensorFlow.
The fifth course can be a bit heavy if done before implementing a project or two using Image Processing and Deep Learning even though it deals with a very different paradigm. The third course, however, holds little potential with respect to research-oriented learning
Books
- Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville
- Digital image processing by Rafael C. Gonzalez
- Computer Vision: Algorithms and Applications by Richard Szeliski