nick has a blog

Exploring disco

without comments

At uni this year I am doing a directed study on distributing machine learning algorithms using MapReduce, with the intention of running it on the uni cluster. MapReduce is an interesting approach to distribution, it has two operations, map and reduce. The map operation is where all the processing happens for a particular item of data, and the reduce function combines the results of the map operations. Disco is a MapReduce framework that allows you to use Python, which happens to be my language of the moment.  It took a bit to get setup on my servers but several debian installs later and a few VM clones it’s working! 

The majority of MapReduce implementations of machine learning algorithms I found were some form of numerical function like bayes or neural networks. I thought I would start a bit different and try and create a rule learner, I am thinking of starting with a basic one like CN2. CN2 is fairly simple rule learner that just searches for rules with good coverage using an entropy or laplace function, and constantly repeating the search until it finds something acceptable. I think it is going to involve using multiple map and reduce chains, with each iteration of the chain doing  a refinement to the rules. Time to start coding..

Written by Nick

January 10th, 2009 at 11:30 pm

Posted in University

Leave a Reply