a collection of projects and ideas
I am currently a data scientist at Foursquare applying machine learning algorithms to large spatiotemporal datasets. I recently completed my PhD with advisor Tony Jebara at Columbia University.
Research Interests: visualization, dimensionality reduction, spatiotemporal data, networks, spectral optimizations, semidefinite programming, data mining, graph algorithms, large datasets
Machine Learning
Publications
Learning to Rank for Spatiotemporal Search
Blake Shaw, Jon Shea, Siddhartha Sinha, Andrew Hogue
Web Search and Data Mining Proceedings WSDM 2013.
Download: Paper (PDF) | BibTex
In this article we consider the problem of mapping a noisy estimate of a user's current location to a semantically meaningful point of interest, such as a home, restaurant, or store. We propose a novel spatial search algorithm that infers a user's location by combining aggregate signals mined from billions of foursquare check-ins with real-time contextual information.
Learning a Distance Metric from a Network
Blake Shaw, Bert Huang, Tony Jebara
Neural Information Processing Systems, NIPS, December 2011.
Download: Paper (PDF) | Supplemental (PDF) | Poster (PDF) | BibTex | Code
Many real-world networks are described by both connectivity information and features for every node. To better model and understand these networks, we present structure preserving metric learning (SPML), an algorithm for learning a Mahalanobis distance metric from a network such that the learned distances are tied to the inherent connectivity structure of the network.
Structure Preserving Embedding
Blake Shaw, Tony Jebara
International Conference on Machine Learning, ICML, June 2009.
Best Paper Award Winner
Download: Paper (PDF) | Poster (PDF) | Slides (PDF) | BibTex
View Talk: videolectures.net | MP4
Structure Preserving Embedding (SPE) is an algorithm for embedding graphs in Euclidean space such that the embedding is low-dimensional and preserves the global topological properties of the input graph. Topology is preserved if a connectivity algorithm, such as k-nearest neighbors, can easily recover the edges of the input graph from only the coordinates of the nodes after embedding.
Minimum Volume Embedding
Blake Shaw, Tony Jebara
Artificial Intelligence and Statistics, AISTATS, March 2007.
Download: Paper (PDF) | Poster | BibTex | Code
Minimum Volume Embedding (MVE) is an algorithm for non-linear dimensionality reduction that uses semidefinite programming (SDP) and matrix factorization to find a low-dimensional embedding that preserves local distances between points while representing the dataset in many fewer dimensions.
Workshop Papers
Recommending Interesting Events in Real-time with Foursquare Check-ins
Max Sklar, Blake Shaw, Andrew Hogue
ACM Conference on Recommender Systems RecSys 2012.
Download: Paper (PDF)
Learning a Degree-Augmented Distance Metric from a Network
Bert Huang, Blake Shaw, Tony Jebara
Beyond Mahalanobis: Supervised Large-Scale Learning of Similarity. NIPS 2011 workshop.
Download: Paper (PDF)
Visualizing Social Networks with Structure Preserving Embedding
Blake Shaw, Tony Jebara
Interdisciplinary Workshop on Information and Decision in Social Networks 2011
Download: Paper (PDF) | Poster (PDF)
Network Prediction with Degree Distributional Metric Learning
Bert Huang, Blake Shaw, Tony Jebara
Interdisciplinary Workshop on Information and Decision in Social Networks 2011
Download: Paper (PDF) | Poster (PDF)
Dimensionality Reduction, Clustering, and PlaceRank Applied to Spatiotemporal Flow Data
Blake Shaw, Tony Jebara
New York Academy of Science - Machine Learning Symposium 2009.
Download: Paper (PDF) | Poster (PDF)
Visualizing Graphs with Structure Preserving Embedding
Blake Shaw, Tony Jebara
Analyzing Graphs: Theory and Applications. NIPS Workshop. December 2008.
Download: Paper (PDF)
Graph Embedding with Global Structure Preserving Constraints
Blake Shaw, Tony Jebara
New York Academy of Science - Machine Learning Symposium, October 2008.
Download: Paper (PDF) | Poster (PDF)
Minimum Volume Embedding (NYAS)
New York Academy of Science - Machine Learning Symposium 2007.
Download: Paper (PDF)
Optimizing Eigengaps and Spectral Functions using Iterated SDP
Tony Jebara, Blake Shaw, Andrew Howard -- Learning Workshop 2007.
Download: Paper (PDF)
B-matching for Embedding
Tony Jebara, Blake Shaw, Vlad Schogolev
Snowbird Machine Learning Conference, April 2006.
Download: Paper (PDF)
Blog Posts and Talks at Foursquare
Big Data and the Big Apple: Understanding New York City using Millions of Check-ins
DataGotham -- September 2012
Machine Learning with Large Networks of People and Places
Blog post | Slides | Video of Talk
Foursquare is now aware of over 1.5 billion check-ins from 15 million people at 30 million different places all over the world. Each check-in can be thought of as an edge in a vast network connecting people to each other and to the places that they care about most. Graph-based machine learning algorithms are critical not only for making sense of these networks that emerge out of patterns of human mobility but also for creating useful data-driven products that help people better navigate the real world. In this talk, we will examine two networks that we have observed at foursquare, the Social Graph and the Place Graph, and then discuss various machine learning and big data techniques for better understanding these networks as well as using them to build a novel recommendation engine we call Explore.
A Hackday Project: What neighborhood is the ‘East Village’ of San Francisco?
Have you ever wondered what’s the equivalent of your neighborhood in another city? How you’d find the Times Square of Tokyo? The Beverly Hills of Dallas? Or the East Village of San Francisco? For a hackday project this January, we mapped our 1,500,000,000 check-ins to 140,000 neighborhoods all over the world to better understand and compare the different places we live, work, and play. Here is a brief account of our hack.
Projects at Sense Networks
CabSense - The Smartest Way to Hail a Cab in NY
MacroSense - Relevant Recommendation, Personalization and Discovery from Mobile Location Data
CitySense - Live San Francisco Nightlife Activity
Teaching
Programming Languages (Matlab)
w3101 section 1 - Spring 2008
Patents
12/134,634 (pending) - System and Method of Performing Location Analytics
12/241,266 (pending) - Event Identification in Sensor Analytics
2/241,227 (pending) - Comparing Spatial-Temporal Trails in Location Analytics