Tuesday, January 03, 2006

Visualizing Folksonomies using Machine Learning Algorithms

This paper, written for my Adv. Machine Learning class, investigates using Semidefinite Embedding (SDE) to visualize data collected from a folksonomy. The del.icio.us social bookmarking service is a perfect example of a folksonomy; a community of users label websites with descriptive tags. Each tag exists in a high-dimensional space corresponding to the frequency of use of that tag among all the users of the system. We are motivated by the following question: can we find a simple low-dimensional structure for these tags that captures the significant relationships inherent in the data? In this paper we explore Semidefinite Embedding, an algorithm for non-linear dimensionality reduction, and its application to visualizing folksonomic systems, focusing on the effects of specifying different levels of connectivity for the data and the heuristics which can be used to find the best parameters for the algorithm. (Full paper found here)

Monday, January 02, 2006

CUtunes Update

CUtunes is looking better after another semester of work. Here is the updated documentation, and some screenshots. Notable new features are user profile pages, flash-based visualizations of your musical neighborhood, and inteligent playlist creation in itunes, allowing the user to say make me a playlist that is like a specified list of musical artists and CUtunes users.

Sunday, January 01, 2006

Utilizing Folksonomy: Similarity Metadata from the Del.icio.us System

Traditionally, metadata is thought of simply as keywords that describe some content, and while the primary aim of folksonomic systems like the Del.icio.us bookmarking tool is to produce these keywords, a richer set of metadata is also produced. Because these keywords are now contributed from many different individuals and aggregated, useful information comes not only from the keyword itself but also from the information about who contributed to labeling the content with that keyword. This idea can be broadened to a general framework for producing a new layer of metadata: similarity between concepts. By analyzing the distributions of how users apply tags, how tags are applied to links, and how users pick content, we should be able to calculate the "distance" between tags, users, and content. This "distance" metric could then be used to construct a more powerful tool for browsing content, allowing the user to specify a query made up of keywords, content, or even other users. Furthermore, this metadata can be condensed into a lower dimensional space and visualized in order to gain better insight into the relationships between the concepts themselves. (Full paper found here)

Building A Better Folksonomy

We live in an age flooded with information. New technologies are making available many large unstructured sets of information. As this information becomes more available, it becomes more difficult to navigate without a guide. Now that a typical user can carry around 10,000 songs in his pocket, the choice of picking which song to listen to becomes increasingly more difficult. Now that a typical user can access 13 billion websites, how does a person know which sites are relevant to him?

The solution to this problem resides in building new web-based technologies that aid in the formation of folksonomies. Folksonomy is commonly defined as a large group of people spontaneously cooperating to organize information into categories [24]. Many websites today are taking advantage of the organizational powers of folksonomies, such as Wikipedia, Flickr, Technorati, Del.icio.us, Yahoo!, and others. All of these sites employ a simple tagging mechanism, where users attribute words or phrases to content. When these tags are aggregated, new metadata for that content is created.

Tagging offers amazing possibilities for information retrieval by using collective social intelligence to organize information instead of relying on one person’s description or categorization. However, tagging only begins to approximate an ideal folksonomy. By simplifying the ways in which we collect metadata from the user, coupling this information collection more strongly with a social framework, and providing more powerful tools for categorization, we should be able to greatly improve systems for retrieving relevant information. (Full paper found here)