Monday, January 08, 2007

Learning from a Visual Folksonomy: Automatically Annotating Images from Flickr

Recently, a large visual dataset has emerged from a web-based photo service called Flickr which utilizes the organizational power of folksonomy to label a tremendous amount of visual data. Flickr users upload snapshots from their digital cameras to the web, and if marked as public, the community annotates these images with descriptive tags. Can this large collective labeling effort be used to train a computer to annotate images? What concepts are we able to train a computer to visually identify?

This project uses a simple crawler to download photos from Flickr labeled with a certain tag, and then extracts color and texture features from these images so that they can be used to train a classifier, such as a Support Vector Machine (SVM). By automating this process of downloading images, extracting features, training, and testing, we are able to apply our system to many different tags and see which tags correspond to identifiable visual features. We have found that the system performs relatively well annotating images with one label, selected from a small vocabulary, for images belonging to concepts with distinct color and texture features. (Full paper found here)