Mining Massive Data Sets

We introduce the participant to modern distributed file systems and MapReduce, including what distinguishes good MapReduce algorithms from good algorithms in general. The rest of the course is devoted to algorithms for extracting models and information from large datasets.

Add your review

Learn Mining Massive Data Sets with this free online course from Stanford University.

Course description – Mining Massive Data Sets

We introduce the participant to modern distributed file systems and MapReduce, including what distinguishes good MapReduce algorithms from good algorithms in general. The rest of the course is devoted to algorithms for extracting models and information from large datasets. Participants will learn how Google’s PageRank algorithm models importance of Web pages and some of the many extensions that have been used for a variety of purposes.

We’ll cover locality-sensitive hashing, a bit of magic that allows you to find similar items in a set of items so large you cannot possibly compare each pair. When data is stored as a very large, sparse matrix, dimensionality reduction is often a good way to model the data, but standard approaches do not scale well; we’ll talk about efficient approaches. Many other large-scale algorithms are covered as well, as outlined in the course syllabus.

What you’ll learn

  • MapReduce systems and algorithms
  • Locality-sensitive hashing
  • Algorithms for data streams
  • PageRank and Web-link analysis
  • Frequent itemset analysis
  • Clustering
  • Computational advertising
  • Recommendation systems
  • Social-network graphs
  • Dimensionality reduction
  • Machine learning algorithms

Subscribe to Get the Best Learning Opportunities

User Reviews

0.0 out of 5
0
0
0
0
0
Write a review

There are no reviews yet.

Be the first to review “Mining Massive Data Sets”

Your email address will not be published. Required fields are marked *

Mining Massive Data Sets
Mining Massive Data Sets
Nasroo