atom feed3 messages in org.apache.hadoop.core-devRe: JobConf.setOutputKeyComparatorClass
FromSent OnAttachments
Arun C MurthyJun 28, 2006 9:06 pm 
Eric BaldeschwielerJun 28, 2006 9:33 pm 
Owen O'MalleyJun 28, 2006 10:31 pm 
Subject:Re: JobConf.setOutputKeyComparatorClass
From:Owen O'Malley (ow@yahoo-inc.com)
Date:Jun 28, 2006 10:31:05 pm
List:org.apache.hadoop.core-dev

On Jun 28, 2006, at 9:06 PM, Arun C Murthy wrote:

All,

<background> I have a *map* which does some processing and then a *reduce* which sorts the results. TextInputFormat & TextOutputFormat are the input/output formats respectively.

However the *sort* I want to perform is as follows: I want to sort output by 'comparing' 'columns' of 'key's in the Comparator and not the entire 'key'.

E.g. spec: column1, column0 is the sort-spec. aaa ccc ggg bbb aaa hhh

should result in: bbb aaa hhh aaa ccc ggg </background>

I can't seem to find an 'elegant' way to do this via the MR framework i.e. I can't seem to be able to set a *policy* (i.e. set the sort-spec) for the WritableComparable via the framework. Is there something I'm missing? In essence I probably need a *configure* callback for the WritableComparable interface too? Is there a better way? Or is this outside the scope of the framework.

There is a way to do it, but it isn't surprising that you missed it. When JobConf creates a new instance of objects, if they are Configurable, they get sent the Configuration. So, if you make a ConfigurableComparator that extends WritableComparator and implements Configurable, it will get its setConf method called with the job's JobConf. Now do something like:

JobConf conf = new JobConf(); conf.set("my.sort.order", "1,0,2"); conf.setOutputKeyComparatorClass(ConfigurableComparator.class);

you should get the information where it needs to go.

-- Owen