atom feed14 messages in org.apache.jackrabbit.dev[jr3] clustering
FromSent OnAttachments
Marcel ReuteggerMar 1, 2012 3:05 am 
Michael DürigMar 1, 2012 5:50 am 
Thomas MuellerMar 1, 2012 6:11 am 
Marcel ReuteggerMar 1, 2012 7:15 am 
Michael DürigMar 1, 2012 9:05 am 
Alexander KlimetschekMar 1, 2012 4:25 pm 
Thomas MuellerMar 2, 2012 3:23 am 
Marcel ReuteggerMar 5, 2012 2:00 am 
Marcel ReuteggerMar 5, 2012 2:05 am 
Thomas MuellerMar 5, 2012 2:36 am 
Marcel ReuteggerMar 5, 2012 2:59 am 
Thomas MuellerMar 5, 2012 3:24 am 
Marcel ReuteggerMar 5, 2012 3:29 am 
Marcel ReuteggerMar 5, 2012 4:23 am 
Subject:[jr3] clustering
From:Marcel Reutegger (mreu@adobe.com)
Date:Mar 1, 2012 3:05:23 am
List:org.apache.jackrabbit.dev

Hi,

recently I was thinking about an approaches to implement clustering in JR3 and I'd like to know what you think about it.

Distributed B-tree

the clustering ideas we discusses so far [0] partition the tree at given paths similar to mount points in a file system. I think the drawback of those approaches is that they are rather static and you'd probably have to define where those mount points are. to me it seems rather difficult to setup a system that is able to grow when the repository size increases.

so, I was thinking of something similar as described in this paper [1] or similar [2]. since a B-tree is basically an ordered list of items we'd have to linearize the JCR or MK hierarchy. I'm not sure whether a depth or breadth first traversal is better suited. maybe there even exists a more sophisticated space filling curve, which is a combination of both. linearizing the hierarchy on a B-tree should give some since locality for nodes that are hierarchically close and probability is high that they are requested in succession. the algorithm discussed in [1] then distributes the nodes of the B-tree randomly to machines (at least in the most simply case). note, that a B-tree node may contain many JCR/MK nodes. with a reasonable size for a B-tree node, we'd get a good balance of locality and distribution to multiple servers.

the algorithm uses optimistic concurrency control and two phase commits when changes need to be applied on multiple servers. I'm not a big fan of two phase commit, but if we assume that most commits are rather small, changes will be applied to a single server and a one phase commit is sufficient. two phase commits would only be necessary for e.g. larger imports.

Open questions:

how does MVCC fit into this? multiple revisions of the same JCR/MK node could be stored on a B-tree node. whenever an update happens the garbage collection could kick in an purge outdated revisions. providing a consistent journal across all servers is not clear to me right now.

How does backup work? this is quite tricky because it is difficult to get a consistent snapshot of the distributed tree.

Regards Marcel

[0]
https://docs.google.com/presentation/pub?id=131sVx5s58jAKE2FSVBfUZVQSl1W820_syyzLYRHGH6E&start=false&loop=false&delayms=3000#slide=id.g4272a65_0_39 [1] http://www.hpl.hp.com/techreports/2007/HPL-2007-193.pdf [2]
http://research.microsoft.com/en-us/people/aguilera/distributed-btree-vldb2008.pdf