I have done some work on image classification and I came across a similar method for classification of flowers that might be of interest to you. It was quite successful on a very large number of flower species in various settings. I think you are on the right track on object classification and improving certain bits could really improve the flexibility of the algorithm.
In that particular paper circular patches represented with SIFT descriptors are used as 'codewords' and clustered to build a vocabulary/dictionary. Images are then represented using that vocabulary. The fact that the patches are circular rather than square makes them rotationally invariant. Furthermore, SIFT descriptors are one of the state-of-the-art methods of describing shape and are scale-invariant. This allows a more descriptive vocabulary to built.
At this stage, you basically have a set of images described by their most prominent shape features in the form of feature vectors. You can then use any algorithms to classify those. A good choice is SVM as it weighs the features based on how much they affect the decision boundary between classes. Thus, if you investigate your top support vectors, you could find not necessarily the common characteristics in a particular class, but rather, the characteristics that differentiate that class from all the other classes. I can see that the DictionaryLearning algorithms also tries to optimize the features used so that could be a good choice too.
This method could improve results in difficult cases where there is large intra-class variance, as you described. The complex descriptors would make the algorithm more robust and accurate in general.
Log In
or
Sign in with Github
Sign in with Twitter
or
Sign in with Github
Sign in with Twitter
Sign Up
or
Sign in with Github
Sign in with Twitter
Reset Password
Enter the email address associated with your account, and we'll email you a link to reset your password.
In that particular paper circular patches represented with SIFT descriptors are used as 'codewords' and clustered to build a vocabulary/dictionary. Images are then represented using that vocabulary. The fact that the patches are circular rather than square makes them rotationally invariant. Furthermore, SIFT descriptors are one of the state-of-the-art methods of describing shape and are scale-invariant. This allows a more descriptive vocabulary to built.
At this stage, you basically have a set of images described by their most prominent shape features in the form of feature vectors. You can then use any algorithms to classify those. A good choice is SVM as it weighs the features based on how much they affect the decision boundary between classes. Thus, if you investigate your top support vectors, you could find not necessarily the common characteristics in a particular class, but rather, the characteristics that differentiate that class from all the other classes. I can see that the DictionaryLearning algorithms also tries to optimize the features used so that could be a good choice too.
This method could improve results in difficult cases where there is large intra-class variance, as you described. The complex descriptors would make the algorithm more robust and accurate in general.