nick has a blog

Archive for January, 2009

Stripping JFIF Comments from JPEG Images

without comments

Most JPEG images include a JFIF comment, which usually describes the creator of the image (not to be confused with Exif data). This data is usually in the range of 50 to 100 bytes, so not much, but it might make a difference if you have a 56k modem and a lot of images. Because this data is effectively useless it is safe to remove it from images.

Here is a python script that will do it:

from binascii import b2a_hex
import sys
 
img = open(sys.argv[1], 'rb')
 
result = ""
lastb = ""
 
while True:
 
	b = img.read(1)
	if b == "":
	    result += lastb
	    break
 
	if lastb + b == "\xFF\xFE":
	    img.read(int(b2a_hex(img.read(2)), 16) - 2)
	    result += img.read()
	    break
	else:
	    result += lastb
	    lastb = b
 
img.close()
 
out = open(sys.argv[2],'wb')
out.write(result)
out.close()

The script basically searches for the JFIF comment header (xFFxEE) and strips the data out. Expect unexpected consequences if the images being processed do not contain JFIF comments!

Written by Nick

January 28th, 2009 at 6:29 pm

Posted in Code

Roasting Coffee!

without comments

warmingAside from computing, my other hobby is coffee!  I have a Hottop coffee roaster that allows me to well, roast coffee.  I’ve had it for awhile now and have built up quite a collection of different varieties of green beans (I lost count how many, but it fills up a good portion of the coffee room).   Coffee roasting is actually a fairly boring thing to do, it takes about 30 minutes all up, and for the most part you do nothing. The interesting parts are blending and watching them eject out of the roaster, and if you are into profiling that can be interesting but the Hottop automates some of the process. 

Most people who roast coffee take great care in the science of blending, I don’t. I prefer to pick some random things and give it a go, I do occasionally get some undrinkable cups, but I get some awesome ones too. But in saying that I do have some no-go blends, and try to aim for a full bodied coffee with minimal acidity.

The roasting process on the Hottop is fairly straight-forward, let it pre-heat to 75°C pour in the beans (that you blended while it was pre-heating) and wait about 18 minutes. Once the temperature of the roaster gets to about 210°C things start to happen fast, there are three stages of cracks. Coffee roasted just past the first crack is what you usually get in a cafe, the second crack is a darker roast, and the third crack is what you get at Starbucks (charcoal). At the end Hottop ejects the beans out of the roaster onto a rotating cooling tray to cool down. Then you start drinking the next day!

spin

Written by Nick

January 14th, 2009 at 6:54 pm

Posted in Coffee

Ferrit

without comments

In an interesting, yet probably inevitable move, Telecom ditched Ferrit.co.nz. The unfortunate part is they had recently really started picking up their game, over the past six months the stability had significantly improved (at least if a reduction in office wtf’s is anything to go by) and they were improving the usability and general functionality (although they had a way to go). I guess the “recession” just came at the wrong time. At least one retailer seems to be happy about it, although I am not sure why as Ferrit had the older demographic with its more traditional advertising methods. Anyway good luck to the Ferrit staffers, they were awesome!

Written by Nick

January 13th, 2009 at 7:00 pm

Posted in Work

Parasitic JavaScript

without comments

I thought I would make a post about my recently completed honours project: Parasitic JavaScript. The idea came to me during a lecture about parasitic computing, or using computers to perform complex calculations without the owners knowledge. The first parasitic computing implementation used TCP packets to calculate 3-SAT solutions. Obviously sending millions of TCP packets over the place to calculate a + b = c is not very efficient, but it worked. Several days after thinking of the idea, I had a working prototype - and a few weeks later I changed my honours project. Using JavaScript has several benefits, the major being almost every web browser has it, and it also has the ability to request remote webpages without the using intervening (aka AJAX) . Using a work unit distribution model I was able to get web browsers to perform several distributed tasks: ray tracing, data mining (perceptron), and finding n-queens solutions using a genetic algorithm.

One of the issues I had was that on slower computers (e.g. my old iBook @ 1.33Ghz/1GB RAM) the computer would become a bit sluggish. So I started experimenting with ways of estimating the performance of the computer, as JavaScript provides no methods for this. I found that JavaScript’s setTimeout method would be less accurate when the computer was under high-load. So I developed a basic algorithm that monitors the accuracy of setTimeout and adjusts the speed at which Parasitic JavaScript was executing. This made Parasitic JavaScript play nicely with the older computers, while maximizing the execution speed of newer ones. 

You can download the full report here.

Recently I have been playing with MapReduce, and it got me thinking about how it could be implemented as Parasitic JavaScript. It would be interesting because there are many different tasks that can fall under the MapReduce model, and it would certainly reduce the amount of work required by the developer to get working in a distributed manner. Maybe this can be my next project!

Written by Nick

January 11th, 2009 at 5:06 pm

Exploring disco

without comments

At uni this year I am doing a directed study on distributing machine learning algorithms using MapReduce, with the intention of running it on the uni cluster. MapReduce is an interesting approach to distribution, it has two operations, map and reduce. The map operation is where all the processing happens for a particular item of data, and the reduce function combines the results of the map operations. Disco is a MapReduce framework that allows you to use Python, which happens to be my language of the moment.  It took a bit to get setup on my servers but several debian installs later and a few VM clones it’s working! 

The majority of MapReduce implementations of machine learning algorithms I found were some form of numerical function like bayes or neural networks. I thought I would start a bit different and try and create a rule learner, I am thinking of starting with a basic one like CN2. CN2 is fairly simple rule learner that just searches for rules with good coverage using an entropy or laplace function, and constantly repeating the search until it finds something acceptable. I think it is going to involve using multiple map and reduce chains, with each iteration of the chain doing  a refinement to the rules. Time to start coding..

Written by Nick

January 10th, 2009 at 11:30 pm

Posted in University

Time to start blogging again..

without comments

This is about the fifth install of wordpress…maybe I will use it this time!

Written by Nick

January 10th, 2009 at 10:56 pm

Posted in Uncategorized