Previously: Highlights #1 - On ML Fundamentals
Monday, July 14, 2014
ICML 2014 Highlights 2: On Deep Learning and Language Modeling
Previously: Highlights #1 - On ML Fundamentals
Friday, July 11, 2014
ICML 2014 Highlights 1: On Machine Learning Fundamentals
Abstract
At a high level, Deep Learning (DL) is still hot and DL keeps eating Machine Learning. The conference's attendance distribution was like: half was there for Deep Learning and the other half was there for *Shallow* Learning :). Interestingly, the conference took place in Beijing, for the first time, and more than 50% of the attendants either study or work there (and most of that local population are students). So the attendance distribution could be biased.In the following, I'll highlight what I've learned and observed from the conference. Here's the outline:
- ML Fundamentals (this post, see below)
- Deep Learning and Language Modeling
- Optimization, Distributed Optimization, and Distributed ML
- Kernel Methods
- Auto ML
- Other Topics
Tuesday, July 1, 2014
On the imminent decline of MapReduce
Google recently announced at Google IO 2014 that they are retiring MapReduce (MR) in favor of a new system called Cloud Dataflow. Well, the article author perhaps dramatized it when quoting Urs Hölzle's words
Regardless of what has happened at Google, I'd like the point out that MR should have been ditched long ago.
Someone at Cloudera (the company that used to make money on the hype of Hadoop MapReduce) already partially explained why in this blog post: The Elephant was a Trojan Horse: On the Death of Map-Reduce at Google. Some quotes to remember are:
Also note that Mahout, the ML library for Hadoop, recently said goodbye to MapReduce.
[*] Unfortunately, lots of companies, including my employer, are still chasing the Hadoop game. Microsoft just less than a year ago announced HDInsight, aka. Hadoop on Azure.
[**] For virtually everything that MR can do, Spark can do equally well and in most cases better. Also note that while Spark is generally fantastic, it is not necessarily the right distributed framework for every ML problem.
We don’t really use MapReduce anymore.You can watch the keynote here for a better context. My guess is that no one is writing new MapReduce jobs anymore, but Google would keep running legacy MR jobs for years until they are all replaced or obsolete.
Regardless of what has happened at Google, I'd like the point out that MR should have been ditched long ago.
Someone at Cloudera (the company that used to make money on the hype of Hadoop MapReduce) already partially explained why in this blog post: The Elephant was a Trojan Horse: On the Death of Map-Reduce at Google. Some quotes to remember are:
- Indeed, it’s a bit of a surprise to me that it lasted this long.
- and the real contribution from Google in this area was arguably GFS, not Map-Reduce.
Also note that Mahout, the ML library for Hadoop, recently said goodbye to MapReduce.
Notes25 April 2014 - Goodbye MapReduce
The Mahout community decided to move its codebase onto modern data processing systems that offer a richer programming model and more efficient execution than Hadoop MapReduce. Mahout will therefore reject new MapReduce algorithm implementations from now on. We will however keep our widely used MapReduce algorithms in the codebase and maintain them.
We are building our future implementations on top of a DSL for linear algebraic operations which has been developed over the last months. Programs written in this DSL are automatically optimized and executed in parallel on Apache Spark.
[*] Unfortunately, lots of companies, including my employer, are still chasing the Hadoop game. Microsoft just less than a year ago announced HDInsight, aka. Hadoop on Azure.
[**] For virtually everything that MR can do, Spark can do equally well and in most cases better. Also note that while Spark is generally fantastic, it is not necessarily the right distributed framework for every ML problem.
Subscribe to:
Comments (Atom)