View on GitHub

resources

Resources on various topics being worked on at IvLabs

Computer Vision

Extracting information from Images, that help robots “see” and “perceive” as humans do, and understand it’s environment using input from it’s camera. The repository focusses on the traditional aspects of computer vision, particularly multiview geometry.

Prerequisites

Linear Algebra

Linear Algebra by Gilbert Strang, MIT OCW - The course provides a formal introduction to linear algebra. Go through the course patiently because some concepts might seem daunting initially.
The Essence of Linear Algebra - It provides the motivation and intuition for some of the concepts.

Digital Image Processing

Refer any one of the following courses,

Image and Video Processing by Guillermo Sapiro, Duke University - This course is also available on Coursera. It helps in building the foundation for some concepts that are used along with computer vision techniques, such as using histogram representation, the concept of filters and denoising, thresholding and contours, followed by some of its applications.
Digital Image Processing by Prof. Prabir Kumar Biswas, IIT Kharagpur - A comprehensive course on image processing, that deals with all the major topics. Familiarity with linear algebra, probability theory and college calculus is expected.
Introduction to Digital Image Processing by Rich Radke, Rensselaer Polytechnique Institute - This is another comprehensive course, which also talks about some special topics on digital watermarking and digital image forensics.
The above set of courses refer to the text provided by Digital image processing, Rafael C. Gonzalez. One can choose to follow just this book, as it is a standard textbook for Digital Image Processing.

Implementation

For the implementation of some of the concepts, OpenCV will help you. The documentation not only explains the concepts but also shows how it can be implemented using Python/C++.

Basic Roadmap

Introduction to Computer Vision

To learn basic concepts that are used in computer vision, Introduction to Computer Vision by Aaron Bobbick, Georgia Tech is a good place to start.
1. The course first gives a refresher on basic image processing techniques, before introducing camera geometry, such as the pinhole model, intrinsic and extrinsic camera parameters and analysing the point correspondences between two views of the same scene.
2. The course moves on to introduce the concept of features, scale invariance and then presents the state of the art method, Scale Invariant Feature Transform (SIFT). One can complete the whole course in order to get an idea of different computer vision and image processing techniques.
3. This will build the foundation for Computer Vision. It also gives sufficient information on Digital Image Processing and the concepts that will be required for the same. If someone watches all the lectures, they are ready to work on a good project in this field. It is recommended that the student is proficient in either MATLAB/Octave or Python. If you are opting for Python, make sure to check out NumPy (helpful, but not necessary) and OpenCV (for Python. Read up to and including Image Processing in OpenCV), which are the associated libraries that will help you to implement many of the concepts. If one does the first 20% (we encourage anyone who takes this course to watch it till the end) of the course too, then he/she will be able to implement a virtual drawing pad, which requires some basic, but power image processing techniques. As one dives into the concept of features, the student will be able to implement more powerful algorithms that will help in projects such as stitching of two images for making panorama, camera calibration to estimate the focal length of the camera using just the camera feed, removal of distortion from the camera output using mathematical techniques, etc.
There is another course, Computer Vision by Mubarak Shah, University of Central Florida for the basic concepts of computer vision. The course also has concise lectures on some very famous techniques used such as Lucas Kanade Tracker (KLT), Structure from Motion and Stereo.
For the feature detection and matching techniques, one can have a look at the documentation provided by OpenCV. The documentation starts with an introduction to the concept of features, followed by discussing various techniques to obtain the feature points.

Books

Computer Vision: Algorithms and Applications, Richard Szeliski - This is a standard textbook for Computer Vision. It is recommended to have a strong mathematical base to properly understand some of the sections.
Multiple View Geometry in computer vision, Richard Hartley and Andrew Zisserman - This book is specific to multi view geometry. One can refer to this book for more advanced topics.
Computer Vision - A Modern Approach, David Forsyth and Jean Ponce - Another book, that also talks about image processing and specific computer vision problems such as tracking, stereo problem, structure from motion, etc.
Computer Vision for Visual Effects - Apart from standard topics, it covers some special topics on image matting, motion capture and 3D data acquisition. Applications are centred around visual effects, used frequently in the movies and television industry.

Other Lecture Series

For more advanced concepts, one can refer to the following lecture courses selectively.

Lecture Notes

Publications

It consists of a list of publications, survey papers and tutorials for some concepts that are usually covered in lectures and books.