Comments about ‘U. researchers meeting with NSA on data mining work’

Return to article »

Published: Monday, July 26 2010 10:00 p.m. MDT

Comments
  • Oldest first
  • Newest first
  • Most recommended
cgriesemer

For the past five semesters I've been working with Dr. Richard Wellman of the Math department at Westminster College on an undergraduate research project focused on machine learning and the Netflix dataset. The details are certainly lacking here, but custom computer programs shouldn't generally struggle with 5000 data points; our fairly standard program can calculate and sort 100 million distances in 200 dimensional space in about a half hour on a typical workstation. I'm very interested to see Venkatasubramanian's work, but the concept of reducing the dimensionality of data isn't new. Indeed, the favored approach for the Netflix data is matrix factorization by descending the gradient of the error function. As a result we compress a matrix with ~8.5 billion entries into ~50 million entries, with the added bonus of predictions given by projecting the user/movie features that were factorized. All of this doesn't take more than a few hours to compute. There are quite a few other ways to compress data (SVD, PCA, FFT) and it will be interesting to see another.

to comment

DeseretNews.com encourages a civil dialogue among its readers. We welcome your thoughtful comments.
About comments