com.cutunes.server
Class Analyzer

java.lang.Object
  extended bycom.cutunes.server.Analyzer

public class Analyzer
extends java.lang.Object

Handles most of the analysis and data archiving Does the following tasks:

  • Downloads current data from the database
  • Archives this data in a sparse matrix format
  • Formats and prints to a file the data to be read by matlab
  • Calculates a similarity matrix of users
  • Calculates a distance matrix of artists
  • Calculates recommendations for each user
  • Loads all of this analysis back into the sql database

    Author:
    blake

    Field Summary
     java.sql.Connection dbConnect
               
    static int SIMILARITY_CUT_OFF
              between 1 and 100, can be used to limit complexity/improve results
     
    Constructor Summary
    Analyzer()
              Constructor Connects to the database
     
    Method Summary
     void calculateAndLoadArtistDist(java.lang.String directory)
              Calculate Artist Distance data from the data in a directory and load that data into the sql database
     void calculateAndLoadRecs(java.util.Vector userSimInfo)
              Calculate and load recs given user similarity info
     java.util.Vector calculateAndLoadUserCompatibility(java.lang.String directory)
              Calculate User Compatibility data from the data in a directory and load that data into the sql database
     void clearRecsForUser(int userID1, java.lang.String type, java.lang.String playCountType)
              Clear recs for current user
     java.lang.String collectData()
              Collect data from the database, and archive it to a file
     void connectToDatbase()
              Connects to CUtunes Database
     void formatDataForMatlab(java.lang.String directory, java.lang.String outputDirectory)
              Format data to be read into matlab
     java.util.Vector getAllItems(java.lang.String type, java.lang.String play_count_type)
              Get a list of items (either songs, albums, or artists) The number of items returned is limited by the global parameters NUM_ITEMS
     void getAndLoadReccomendaton(int userID1, java.util.Vector userIDs, int userIndex1, java.lang.String type, java.lang.String playCountType, int[][] similarity)
              Gets reccomendations for current user and loads them into the db
     java.util.Vector getArtistDistanceInfo(java.lang.String directory, java.lang.String play_count_type, int limit)
              Calculate artist compatibility info
     java.util.Vector getItemInfo(java.lang.String directory, java.lang.String type, java.lang.String play_count_type)
              Get item info from archived data, items can be songs, albums, or artists
     void getNNewItems(int userID2, int n, java.util.Hashtable myItems, java.util.Hashtable newItems, int simRank, java.lang.String type, java.lang.String playCountType)
              Gets N new songs/albums/artists from another user This method should probably be replaced by a more efficient sql query Scores for recs are weighted by similarity with that user
     int getTotalPlays(int userID, java.lang.String playCountType)
              Get total play counts for a user
     java.util.Vector getUserCompatibilityInfo(java.lang.String directory, java.lang.String type, java.lang.String play_count_type)
              Calculate user compatibility info
     java.util.Vector getUserInfo(java.lang.String directory)
              Get user info from an archived set of data
     java.util.Vector getUserItems(java.lang.String type, java.lang.String play_count_type, int user_id)
              Get a list of items for a given user, items can be either songs, albums, or artists
     java.util.Vector getUsers()
              get list of user_ids, and userinfo to print to text file
     double KLdistanceMetric(double p, double q)
              Distance metric for comparing two users of an item
     java.util.Hashtable loadHashtable(int userID1, java.lang.String type, java.lang.String playCountType)
              This method loads a hashtable with all of the users song It is used to identify songs in common, and quickly retrieve play counts
     void loadRecIntoDB(int userID1, java.lang.String s, int r, java.lang.String type, java.lang.String playCountType)
              Load a reccomendation into the db for current user
    static void main(java.lang.String[] args)
              The main method
     void printData(jmp.SparseColumnMatrix theMat)
              Debug method for printing the first 20 items in the matrix parameter theMat
     void printList(java.util.Vector v, java.lang.String filename)
              Print a vector of strings to a file
     jmp.SparseColumnMatrix readMatrixDataFile(java.lang.String filename)
              Read a matrix from a file
     double userDistanceMetric(double p, double q)
              Distance metric for comparing two items of a user (one dimension)
     void writeDistanceSQL(java.lang.String filename, int[][] dists, java.util.Vector names)
              Write distance information to an sql file
     void writeMatrixDataFile(jmp.SparseColumnMatrix mat, java.lang.String filename)
              Write a matrix to a file
     
    Methods inherited from class java.lang.Object
    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
     

    Field Detail

    SIMILARITY_CUT_OFF

    public static final int SIMILARITY_CUT_OFF
    between 1 and 100, can be used to limit complexity/improve results

    See Also:
    Constant Field Values

    dbConnect

    public java.sql.Connection dbConnect
    Constructor Detail

    Analyzer

    public Analyzer()
    Constructor Connects to the database

    Method Detail

    main

    public static void main(java.lang.String[] args)
    The main method

    Parameters:
    args -

    calculateAndLoadArtistDist

    public void calculateAndLoadArtistDist(java.lang.String directory)
    Calculate Artist Distance data from the data in a directory and load that data into the sql database

    Parameters:
    directory -

    formatDataForMatlab

    public void formatDataForMatlab(java.lang.String directory,
                                    java.lang.String outputDirectory)
    Format data to be read into matlab

    Parameters:
    directory - directory of the data
    outputDirectory - output directory

    calculateAndLoadUserCompatibility

    public java.util.Vector calculateAndLoadUserCompatibility(java.lang.String directory)
    Calculate User Compatibility data from the data in a directory and load that data into the sql database

    Parameters:
    directory -
    Returns:
    A vector containing the following items:
  • list of user ids
  • distance matrix all time
  • distance matrix week

  • getUserCompatibilityInfo

    public java.util.Vector getUserCompatibilityInfo(java.lang.String directory,
                                                     java.lang.String type,
                                                     java.lang.String play_count_type)
    Calculate user compatibility info

    Parameters:
    directory - directory of the archive
    type - "song", "album", or "artist"
    play_count_type - "play_count", "play_count_week"
    Returns:
    A vector containing the following items:
  • distMat -- a 2d double array with compatibility info for each pair of users
  • userIDs -- a vector of userIDs
  • fullNames -- a vector of names suitable for display

  • getArtistDistanceInfo

    public java.util.Vector getArtistDistanceInfo(java.lang.String directory,
                                                  java.lang.String play_count_type,
                                                  int limit)
    Calculate artist compatibility info

    Parameters:
    directory - directory of the archive
    play_count_type - "play_count", "play_count_week"
    Returns:
    a vector containing the following items:
  • distMat -- a 2d double array with compatibility info for each pair of users
  • compatItems -- a string array with the 5 dimensions of compatibility for each user
  • userNames -- a vector of usernames

  • userDistanceMetric

    public double userDistanceMetric(double p,
                                     double q)
    Distance metric for comparing two items of a user (one dimension)

    Parameters:
    p -
    q -
    Returns:
    The distance

    KLdistanceMetric

    public double KLdistanceMetric(double p,
                                   double q)
    Distance metric for comparing two users of an item

    Parameters:
    p -
    q -
    Returns:
    The distance

    writeDistanceSQL

    public void writeDistanceSQL(java.lang.String filename,
                                 int[][] dists,
                                 java.util.Vector names)
    Write distance information to an sql file

    Parameters:
    dists - 2d array containing distance information
    names - Vector of names

    getItemInfo

    public java.util.Vector getItemInfo(java.lang.String directory,
                                        java.lang.String type,
                                        java.lang.String play_count_type)
    Get item info from archived data, items can be songs, albums, or artists

    Parameters:
    directory - directory of the archive
    type - "song", "album", or "artist"
    play_count_type - "play_count", "play_count_week"
    Returns:
    A vector containing the following items:
  • A vector of item names
  • A vector of total plays for each item

  • getUserInfo

    public java.util.Vector getUserInfo(java.lang.String directory)
    Get user info from an archived set of data

    Parameters:
    directory - the directory of the archive
    Returns:
    A vector containing the following items:
  • userNames -- a vector of usernames
  • totPlays -- total plays for each user
  • topPlaysWeek -- total plays (week) for each user
  • userIDs -- user ids for each user
  • fullnames -- display names for each user

  • readMatrixDataFile

    public jmp.SparseColumnMatrix readMatrixDataFile(java.lang.String filename)
    Read a matrix from a file

    Parameters:
    filename -
    Returns:

    collectData

    public java.lang.String collectData()
    Collect data from the database, and archive it to a file

    Returns:
    the path to the directory where the data was collected

    writeMatrixDataFile

    public void writeMatrixDataFile(jmp.SparseColumnMatrix mat,
                                    java.lang.String filename)
    Write a matrix to a file

    Parameters:
    mat -
    filename -

    printList

    public void printList(java.util.Vector v,
                          java.lang.String filename)
    Print a vector of strings to a file

    Parameters:
    v -
    filename -

    calculateAndLoadRecs

    public void calculateAndLoadRecs(java.util.Vector userSimInfo)
    Calculate and load recs given user similarity info

    Parameters:
    userSimInfo - A vector containing the following items:
  • A vector of userIDs
  • a 2d int array containing all time compatibility info
  • a 2d int array containing this week compatibility info

  • getAndLoadReccomendaton

    public void getAndLoadReccomendaton(int userID1,
                                        java.util.Vector userIDs,
                                        int userIndex1,
                                        java.lang.String type,
                                        java.lang.String playCountType,
                                        int[][] similarity)
    Gets reccomendations for current user and loads them into the db

    Parameters:
    userID1 - the userID
    userIDs - all user IDs
    userIndex1 - the index of userID in similarity table
    type - song, album, or artist
    playCountType - all time, or week
    similarity - 2d array containing similarity info

    clearRecsForUser

    public void clearRecsForUser(int userID1,
                                 java.lang.String type,
                                 java.lang.String playCountType)
    Clear recs for current user

    Parameters:
    type - name (song name), album, artist
    playCountType - play_count or play_count_week

    loadRecIntoDB

    public void loadRecIntoDB(int userID1,
                              java.lang.String s,
                              int r,
                              java.lang.String type,
                              java.lang.String playCountType)
    Load a reccomendation into the db for current user

    Parameters:
    s - the reccomendation to load
    r - the value for how strong the reccomendation is
    type - name (song name), album, artist
    playCountType - play_count or play_count_week

    getNNewItems

    public void getNNewItems(int userID2,
                             int n,
                             java.util.Hashtable myItems,
                             java.util.Hashtable newItems,
                             int simRank,
                             java.lang.String type,
                             java.lang.String playCountType)
    Gets N new songs/albums/artists from another user This method should probably be replaced by a more efficient sql query Scores for recs are weighted by similarity with that user

    Parameters:
    userID2 - The other user to get recs from
    n - number of items to collect
    myItems - hashtable of current users music
    newItems - hashtable of new items
    simRank - similarity with other user
    type - name (song name), album, artist
    playCountType - play_count or play_count_week

    loadHashtable

    public java.util.Hashtable loadHashtable(int userID1,
                                             java.lang.String type,
                                             java.lang.String playCountType)
    This method loads a hashtable with all of the users song It is used to identify songs in common, and quickly retrieve play counts

    Parameters:
    userID1 -
    type - name (song name), album, artist
    playCountType - play_count or play_count_week
    Returns:
    userHashTable

    getUserItems

    public java.util.Vector getUserItems(java.lang.String type,
                                         java.lang.String play_count_type,
                                         int user_id)
    Get a list of items for a given user, items can be either songs, albums, or artists

    Parameters:
    type - type of item to return, can be "song", "album", or "artist"
    play_count_type - either "play_count" or "play_count_week"
    user_id -
    Returns:
    A vector containing the following items:
  • items: a vector of items as strings
  • playCounts: a vector of playCounts as integers

  • getAllItems

    public java.util.Vector getAllItems(java.lang.String type,
                                        java.lang.String play_count_type)
    Get a list of items (either songs, albums, or artists) The number of items returned is limited by the global parameters NUM_ITEMS

    Parameters:
    type - type of item to return, can be "song", "album", or "artist"
    play_count_type - either "play_count" or "play_count_week"
    Returns:
    A vector containing the following items:
  • forFile -- vector of strings to be printed to the data file consisting of
  • item names and total play count value
  • items -- a hashtable of items

  • getUsers

    public java.util.Vector getUsers()
    get list of user_ids, and userinfo to print to text file

    Returns:
    A vector containing the following items:
  • ids -- a list of user ids
  • strings to print to file -- strings for printing to the data file
  • consisting of names and total playcounts for week and all time

  • printData

    public void printData(jmp.SparseColumnMatrix theMat)
    Debug method for printing the first 20 items in the matrix parameter theMat

    Parameters:
    theMat -

    getTotalPlays

    public int getTotalPlays(int userID,
                             java.lang.String playCountType)
    Get total play counts for a user

    Parameters:
    userID -
    playCountType -
    Returns:
    totalPlays

    connectToDatbase

    public void connectToDatbase()
    Connects to CUtunes Database