<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>

<channel>
	<title>nick has a blog</title>
	<atom:link href="http://nickjenkin.com/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://nickjenkin.com/blog</link>
	<description>Just another WordPress weblog</description>
	<pubDate>Mon, 15 Mar 2010 07:40:02 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Graduated, moved to Australia</title>
		<link>http://nickjenkin.com/blog/2010/01/graduated-moved-to-australia/</link>
		<comments>http://nickjenkin.com/blog/2010/01/graduated-moved-to-australia/#comments</comments>
		<pubDate>Tue, 05 Jan 2010 09:31:37 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://nickjenkin.com/blog/?p=101</guid>
		<description><![CDATA[So, it&#8217;s been a busy few months. In July, I completed all my course work, the final of which was a report on machine learning using Hadoop/MapReduce. The result was a nice simple method for distributing machine learning algorithms over a Hadoop cluster without modification. The report is available for download. In August, I packed [...]]]></description>
			<content:encoded><![CDATA[<p>So, it&#8217;s been a busy few months. In July, I completed all my course work, the final of which was a report on machine learning using Hadoop/MapReduce. The result was a nice simple method for distributing machine learning algorithms over a Hadoop cluster without modification. The report is available for <a href="http://nickjenkin.com/390/mlhadoop-report.pdf">download</a>. In August, I packed up and moved to Sydney, Australia where I am working for <a href="http://www.thenile.com.au">The Nile</a> as a senior systems engineer. In October, I flew back to NZ and officially graduated, with a Bachelor in Computing and Mathematical science. What next, I don&#8217;t know! But I hope this year is just as exciting as last!</p>
]]></content:encoded>
			<wfw:commentRss>http://nickjenkin.com/blog/2010/01/graduated-moved-to-australia/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Naïve Bayes in Hadoop</title>
		<link>http://nickjenkin.com/blog/2009/04/naive-bayes-in-hadoop/</link>
		<comments>http://nickjenkin.com/blog/2009/04/naive-bayes-in-hadoop/#comments</comments>
		<pubDate>Sun, 19 Apr 2009 08:44:31 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
		
		<category><![CDATA[Machine Learning]]></category>

		<category><![CDATA[University]]></category>

		<guid isPermaLink="false">http://nickjenkin.com/blog/?p=85</guid>
		<description><![CDATA[Naïve Bayes is a probabilistic data mining classifier which fits nicely into the MapReduce model and gives pretty good predictive performance for its simplicity. The Hadoop implementation uses a single map/reduce operation to calculate the mean and standard deviation of each attribute/class combination, as well as the global class distribution of the training dataset.
Some basic pseudo [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/Naive_Bayesian_classification">Naïve Bayes</a> is a probabilistic data mining classifier which fits nicely into the MapReduce model and gives pretty good predictive performance for its simplicity. The Hadoop implementation uses a single map/reduce operation to calculate the mean and standard deviation of each attribute/class combination, as well as the global class distribution of the training dataset.</p>
<p>Some basic pseudo code:</p>
<p>instance = single row of training set<br />
instance.class = class/target of row<br />
instance.attributes = list of attributes</p>

<div class="wp_syntax"><div class="code"><pre class="python python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #008000;">map</span><span style="color: black;">&#40;</span>key, instance<span style="color: black;">&#41;</span>:
	i = 0
	<span style="color: #ff7700;font-weight:bold;">for</span> attribute <span style="color: #ff7700;font-weight:bold;">in</span> instance.<span style="color: black;">attributes</span>:
		collect<span style="color: black;">&#40;</span>instance.<span style="color: #ff7700;font-weight:bold;">class</span> + <span style="color: #483d8b;">&quot;_&quot;</span> + i, attribute<span style="color: black;">&#41;</span>
		i++
&nbsp;
	collect<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;target_&quot;</span> + instance.<span style="color: #ff7700;font-weight:bold;">class</span>, <span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span> <span style="color: #808080; font-style: italic;"># class distribution</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #008000;">reduce</span><span style="color: black;">&#40;</span>key, values<span style="color: black;">&#41;</span>:
	<span style="color: #ff7700;font-weight:bold;">if</span> key.<span style="color: black;">startsWith</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;target_&quot;</span><span style="color: black;">&#41;</span>: <span style="color: #808080; font-style: italic;"># reduce class dist keys</span>
		<span style="color: #008000;">sum</span> = 0
		<span style="color: #ff7700;font-weight:bold;">for</span> v <span style="color: #ff7700;font-weight:bold;">in</span> values:
			<span style="color: #008000;">sum</span> += v
                collect<span style="color: black;">&#40;</span>key,<span style="color: #008000;">sum</span><span style="color: black;">&#41;</span>
&nbsp;
	<span style="color: #ff7700;font-weight:bold;">else</span>: <span style="color: #808080; font-style: italic;"># reduce attribute/class keys</span>
		<span style="color: #008000;">sum</span>=0
		sumSq = 0
		count = 0
		<span style="color: #ff7700;font-weight:bold;">for</span> v <span style="color: #ff7700;font-weight:bold;">in</span> values:
			<span style="color: #008000;">sum</span> += v
			sumSq += v<span style="color: #66cc66;">*</span>v
			count++
&nbsp;
		mean = <span style="color: #008000;">sum</span>/count
		collect<span style="color: black;">&#40;</span>key + <span style="color: #483d8b;">&quot;_mean&quot;</span>, mean<span style="color: black;">&#41;</span>
		collect<span style="color: black;">&#40;</span>key + <span style="color: #483d8b;">&quot;_stddev&quot;</span>, sqrt<span style="color: black;">&#40;</span><span style="color: #008000;">abs</span><span style="color: black;">&#40;</span>sumSq - mean <span style="color: #66cc66;">*</span> <span style="color: #008000;">sum</span><span style="color: black;">&#41;</span> / count<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>This will produce a file of means, standard deviations and a class distribution for which you can then load into a model (such as that found in <a href="http://www.cs.waikato.ac.nz/ml/weka/" target="_blank">Weka</a>- weka.classifiers.bayes.NaiveBayes.distributionForInstance). This doesn&#8217;t support discrete attributes yet, only numeric/real ones. Working on it.</p>
<p>I was able to process a ~5GB file of ~200k rows/2000 attributes per row in 4 minutes on 30 nodes.</p>
]]></content:encoded>
			<wfw:commentRss>http://nickjenkin.com/blog/2009/04/naive-bayes-in-hadoop/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Machine Learning with Hadoop</title>
		<link>http://nickjenkin.com/blog/2009/03/machine-learning-with-hadoop/</link>
		<comments>http://nickjenkin.com/blog/2009/03/machine-learning-with-hadoop/#comments</comments>
		<pubDate>Wed, 25 Mar 2009 07:32:18 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
		
		<category><![CDATA[Machine Learning]]></category>

		<category><![CDATA[University]]></category>

		<guid isPermaLink="false">http://nickjenkin.com/blog/?p=75</guid>
		<description><![CDATA[At University this year I am working on a machine learning framework using Hadoop (MapReduce), with the intent on running it on the University cluster. Initially I started playing with Disco, but it was a bit tedious to setup on two nodes, let alone 100, so Hadoop it is. So far progress has been good, as I [...]]]></description>
			<content:encoded><![CDATA[<p>At University this year I am working on a machine learning framework using Hadoop (MapReduce), with the intent on running it on the University cluster. Initially I started playing with Disco, but it was a bit tedious to setup on two nodes, let alone 100, so <a href="http://hadoop.apache.org/" target="_blank">Hadoop</a> it is. So far progress has been good, as I have a working prototype that can take generic classifiers and evaluate them (e.g. the basic functionality of <a href="http://www.cs.waikato.ac.nz/ml/weka/" target="_blank">Weka</a>). The MapReduce model was a bit unusual at first, but once you understand the basics it is insanely easy to use, which is always a bonus.</p>
<p><img class="aligncenter size-full wp-image-77" title="mlhadoop" src="http://nickjenkin.com/blog/wp-content/uploads/2009/03/mlhadoop.png" alt="mlhadoop" width="382" height="160" /></p>
<p>The framework itself has been kept fairly basic, map functions of classifiers are provided with an Instance of data (a single row of a training dataset), this then allows the classifier to query attribute types and values. The reducer output of a classifier produces a Model file, which the evaluator then uses to evaluate the classifier on a test dataset. This was a little hacky, because reducers only produce key/value pairs the Model files have to be highly customised to each classifier (as such, a classifier must implement a model parser), a little extra coding, but for the extra effort you get massive scalability..which is always good to have. I have tried to give classifiers lots of <span>flexibility </span>in terms of how they operate. Many algorithms are going to require multiple MapReduce jobs, so a classifier is able to create new tasks as required. This sort of functionality would allow for meta classifiers like Bagging to be implemented as well. I am still pondering on adding cross validation support, but given that cross validation is generally used to compensate for smaller datasets, it probably isn&#8217;t <span>necessary</span>.</p>
<p>Initial testing looks good, on a small setup I have at home (two Dual Xeon 3Ghz/6GB RAM servers)  I was able to process a 2GB dataset in three minutes, using the extremely basic zero rule classifier (which is similar in terms of functionality as a word count, so not to intensive). The first real classifier I am going to implement is <a href="http://en.wikipedia.org/wiki/Naive_Bayes_classifier" target="_blank">Naive Bayes</a>, it seems a fairly popular choice in the literature for map reduce applications, probably because it is so simple. In addition to this I have decided I will never ever buy servers that aren&#8217;t going into a data centre. Noise is not productive!</p>
]]></content:encoded>
			<wfw:commentRss>http://nickjenkin.com/blog/2009/03/machine-learning-with-hadoop/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Stripping JFIF Comments from JPEG Images</title>
		<link>http://nickjenkin.com/blog/2009/01/stripping-jfif-comments-from-jpeg-images/</link>
		<comments>http://nickjenkin.com/blog/2009/01/stripping-jfif-comments-from-jpeg-images/#comments</comments>
		<pubDate>Wed, 28 Jan 2009 05:29:39 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
		
		<category><![CDATA[Code]]></category>

		<guid isPermaLink="false">http://nickjenkin.com/blog/?p=49</guid>
		<description><![CDATA[Most JPEG images include a JFIF comment, which usually describes the creator of the image (not to be confused with Exif data). This data is usually in the range of 50 to 100 bytes, so not much, but it might make a difference if you have a 56k modem and a lot of images. Because [...]]]></description>
			<content:encoded><![CDATA[<p>Most JPEG images include a JFIF comment, which usually describes the creator of the image (not to be confused with Exif data). This data is usually in the range of 50 to 100 bytes, so not much, but it might make a difference if you have a 56k modem and a lot of images. Because this data is effectively useless it is safe to remove it from images.</p>
<p>Here is a python script that will do it:</p>

<div class="wp_syntax"><div class="code"><pre class="python python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">binascii</span> <span style="color: #ff7700;font-weight:bold;">import</span> b2a_hex
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">sys</span>
&nbsp;
img = <span style="color: #008000;">open</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">sys</span>.<span style="color: black;">argv</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>, <span style="color: #483d8b;">'rb'</span><span style="color: black;">&#41;</span>
&nbsp;
result = <span style="color: #483d8b;">&quot;&quot;</span>
lastb = <span style="color: #483d8b;">&quot;&quot;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">while</span> <span style="color: #008000;">True</span>:
&nbsp;
	b = img.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span>
	<span style="color: #ff7700;font-weight:bold;">if</span> b == <span style="color: #483d8b;">&quot;&quot;</span>:
	    result += lastb
	    <span style="color: #ff7700;font-weight:bold;">break</span>
&nbsp;
	<span style="color: #ff7700;font-weight:bold;">if</span> lastb + b == <span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\x</span>FF<span style="color: #000099; font-weight: bold;">\x</span>FE&quot;</span>:
	    img.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: #008000;">int</span><span style="color: black;">&#40;</span>b2a_hex<span style="color: black;">&#40;</span>img.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>, <span style="color: #ff4500;">16</span><span style="color: black;">&#41;</span> - <span style="color: #ff4500;">2</span><span style="color: black;">&#41;</span>
	    result += img.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
	    <span style="color: #ff7700;font-weight:bold;">break</span>
	<span style="color: #ff7700;font-weight:bold;">else</span>:
	    result += lastb
	    lastb = b
&nbsp;
img.<span style="color: black;">close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
out = <span style="color: #008000;">open</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">sys</span>.<span style="color: black;">argv</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span>,<span style="color: #483d8b;">'wb'</span><span style="color: black;">&#41;</span>
out.<span style="color: black;">write</span><span style="color: black;">&#40;</span>result<span style="color: black;">&#41;</span>
out.<span style="color: black;">close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>The script basically searches for the JFIF comment header (xFFxEE) and strips the data out. Expect unexpected consequences if the images being processed do not contain JFIF comments!</p>
]]></content:encoded>
			<wfw:commentRss>http://nickjenkin.com/blog/2009/01/stripping-jfif-comments-from-jpeg-images/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Roasting Coffee!</title>
		<link>http://nickjenkin.com/blog/2009/01/roasting-coffee/</link>
		<comments>http://nickjenkin.com/blog/2009/01/roasting-coffee/#comments</comments>
		<pubDate>Wed, 14 Jan 2009 05:54:38 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
		
		<category><![CDATA[Coffee]]></category>

		<guid isPermaLink="false">http://nickjenkin.com/blog/?p=27</guid>
		<description><![CDATA[Aside from computing, my other hobby is coffee!  I have a Hottop coffee roaster that allows me to well, roast coffee.  I&#8217;ve had it for awhile now and have built up quite a collection of different varieties of green beans (I lost count how many, but it fills up a good portion of the coffee room).   Coffee roasting [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-medium wp-image-34" title="warming" src="http://nickjenkin.com/blog/wp-content/uploads/2009/01/warming-300x225.jpg" alt="warming" width="216" height="162" />Aside from computing, my other hobby is coffee!  I have a <a href="http://www.hottopusa.com/" target="_blank">Hottop</a> coffee roaster that allows me to well, roast coffee.  I&#8217;ve had it for awhile now and have built up quite a collection of different varieties of green beans (I lost count how many, but it fills up a good portion of the coffee room).   Coffee roasting is actually a fairly boring thing to do, it takes about 30 minutes all up, and for the most part you do nothing. The interesting parts are blending and watching them eject out of the roaster, and if you are into <a href="http://www.gourmetcoffeeshop.net/roast-profile-coffee.htm" target="_blank">profiling</a> that can be interesting but the Hottop automates some of the process. </p>
<p>Most people who roast coffee take great care in the science of blending, I don&#8217;t. I prefer to pick some random things and give it a go, I do occasionally get some undrinkable cups, but I get some awesome ones too. But in saying that I do have some no-go blends, and try to aim for a full bodied coffee with minimal acidity.</p>
<p>The roasting process on the Hottop is fairly straight-forward, let it pre-heat to 75°C pour in the beans (that you blended while it was pre-heating) and wait about 18 minutes. Once the temperature of the roaster gets to about 210°C things start to happen fast, there are three stages of cracks. Coffee roasted just past the first crack is what you usually get in a cafe, the second crack is a darker roast, and the third crack is what you get at Starbucks (charcoal). At the end Hottop ejects the beans out of the roaster onto a rotating cooling tray to cool down. Then you start drinking the next day!</p>
<p><img class="aligncenter size-medium wp-image-31" title="spin" src="http://nickjenkin.com/blog/wp-content/uploads/2009/01/spin-300x225.jpg" alt="spin" width="300" height="225" /></p>
]]></content:encoded>
			<wfw:commentRss>http://nickjenkin.com/blog/2009/01/roasting-coffee/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Ferrit</title>
		<link>http://nickjenkin.com/blog/2009/01/ferrit/</link>
		<comments>http://nickjenkin.com/blog/2009/01/ferrit/#comments</comments>
		<pubDate>Tue, 13 Jan 2009 06:00:41 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
		
		<category><![CDATA[Work]]></category>

		<guid isPermaLink="false">http://nickjenkin.com/blog/?p=18</guid>
		<description><![CDATA[In an interesting, yet probably inevitable move, Telecom ditched Ferrit.co.nz. The unfortunate part is they had recently really started picking up their game, over the past six months the stability had significantly improved (at least if a reduction in office wtf&#8217;s is anything to go by) and they were improving the usability and general functionality (although they had a [...]]]></description>
			<content:encoded><![CDATA[<p>In an interesting, yet probably <span>inevitable </span>move, Telecom ditched <a href="http://www.ferrit.co.nz" target="_blank">Ferrit.co.nz</a>. The unfortunate part is they had recently really started picking up their game, over the past six months the stability had significantly improved (at least if a reduction in office wtf&#8217;s is anything to go by) and they were improving the usability and general functionality (although they had a way to go). I guess the &#8220;recession&#8221; just came at the wrong time. At least <a href="http://www.barty.co.nz/?p=145" target="_blank">one retailer</a> seems to be happy about it, although I am not sure why as Ferrit had the older demographic with its more traditional advertising methods. Anyway good luck to the Ferrit staffers, they were awesome!</p>
]]></content:encoded>
			<wfw:commentRss>http://nickjenkin.com/blog/2009/01/ferrit/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Parasitic JavaScript</title>
		<link>http://nickjenkin.com/blog/2009/01/parasitic-javascript/</link>
		<comments>http://nickjenkin.com/blog/2009/01/parasitic-javascript/#comments</comments>
		<pubDate>Sun, 11 Jan 2009 04:06:06 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
		
		<category><![CDATA[Parasitic JavaScript]]></category>

		<category><![CDATA[University]]></category>

		<guid isPermaLink="false">http://nickjenkin.com/blog/?p=10</guid>
		<description><![CDATA[I thought I would make a post about my recently completed honours project: Parasitic JavaScript. The idea came to me during a lecture about parasitic computing, or using computers to perform complex calculations without the owners knowledge. The first parasitic computing implementation used TCP packets to calculate 3-SAT solutions. Obviously sending millions of TCP packets over the [...]]]></description>
			<content:encoded><![CDATA[<p>I thought I would make a post about my recently completed honours project: Parasitic JavaScript. The idea came to me during a lecture about parasitic computing, or using computers to perform complex calculations without the owners knowledge. The first parasitic computing implementation used TCP packets to calculate <a href="http://en.wikipedia.org/wiki/Boolean_satisfiability_problem" target="_blank">3-SAT</a> solutions. Obviously sending millions of TCP packets over the place to calculate a + b = c is not very efficient, but it worked. Several days after thinking of the idea, I had a working prototype - and a few weeks later I changed my honours project. Using JavaScript has several benefits, the major being almost every web browser has it, and it also has the ability to request remote webpages without the using intervening (aka AJAX) . Using a work unit distribution model I was able to get web browsers to perform several distributed tasks: ray tracing, data mining (perceptron), and finding n-queens solutions using a genetic algorithm.</p>
<p>One of the issues I had was that on slower computers (e.g. my old iBook @ 1.33Ghz/1GB RAM) the computer would become a bit sluggish. So I started experimenting with ways of estimating the performance of the computer, as JavaScript provides no methods for this. I found that JavaScript&#8217;s setTimeout method would be less accurate when the computer was under high-load. So I developed a basic algorithm that monitors the accuracy of setTimeout and adjusts the speed at which Parasitic JavaScript was executing. This made Parasitic JavaScript play nicely with the older computers, while maximizing the execution speed of newer ones. </p>
<p>You can <a href="http://nickjenkin.com/520/pjs-final.pdf">download the full report here</a>.</p>
<p>Recently I have been playing with <a href="http://labs.google.com/papers/mapreduce.html" target="_blank">MapReduce</a>, and it got me thinking about how it could be implemented as Parasitic JavaScript. It would be interesting because there are many different tasks that can fall under the MapReduce model, and it would certainly reduce the amount of work required by the developer to get working in a distributed manner. Maybe this can be my next project!</p>
]]></content:encoded>
			<wfw:commentRss>http://nickjenkin.com/blog/2009/01/parasitic-javascript/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Exploring disco</title>
		<link>http://nickjenkin.com/blog/2009/01/exploring-disco/</link>
		<comments>http://nickjenkin.com/blog/2009/01/exploring-disco/#comments</comments>
		<pubDate>Sat, 10 Jan 2009 10:30:57 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
		
		<category><![CDATA[University]]></category>

		<guid isPermaLink="false">http://nickjenkin.com/blog/?p=5</guid>
		<description><![CDATA[At uni this year I am doing a directed study on distributing machine learning algorithms using MapReduce, with the intention of running it on the uni cluster. MapReduce is an interesting approach to distribution, it has two operations, map and reduce. The map operation is where all the processing happens for a particular item of [...]]]></description>
			<content:encoded><![CDATA[<p>At uni this year I am doing a directed study on distributing machine learning algorithms using MapReduce, with the intention of running it on the uni cluster. MapReduce is an interesting approach to distribution, it has two operations, map and reduce. The map operation is where all the processing happens for a particular item of data, and the reduce function combines the results of the map operations. <a href="http://discoproject.org" target="_blank">Disco</a> is a MapReduce framework that allows you to use Python, which happens to be my language of the moment.  It took a bit to get setup on my servers but several debian installs later and a few VM clones it&#8217;s working! </p>
<p>The majority of MapReduce implementations of machine learning algorithms I found were some form of numerical function like bayes or neural networks. I thought I would start a bit different and try and create a rule learner, I am thinking of starting with a basic one like <a href="http://www.springerlink.com/index/K756V3PK43631764.pdf">CN2</a>. CN2 is fairly simple rule learner that just searches for rules with good coverage using an entropy or laplace function, and constantly repeating the search until it finds something acceptable. I think it is going to involve using multiple map and reduce chains, with each iteration of the chain doing  a refinement to the rules. Time to start coding..</p>
]]></content:encoded>
			<wfw:commentRss>http://nickjenkin.com/blog/2009/01/exploring-disco/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Time to start blogging again..</title>
		<link>http://nickjenkin.com/blog/2009/01/time-to-start-blogging-again/</link>
		<comments>http://nickjenkin.com/blog/2009/01/time-to-start-blogging-again/#comments</comments>
		<pubDate>Sat, 10 Jan 2009 09:56:11 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://nickjenkin.com/blog/?p=3</guid>
		<description><![CDATA[This is about the fifth install of wordpress&#8230;maybe I will use it this time!
]]></description>
			<content:encoded><![CDATA[<p>This is about the fifth install of wordpress&#8230;maybe I will use it this time!</p>
]]></content:encoded>
			<wfw:commentRss>http://nickjenkin.com/blog/2009/01/time-to-start-blogging-again/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
