atom feed27 messages in org.apache.lucene.lucy-userRe: [lucy-user] Concurrent searching
FromSent OnAttachments
goran kentNov 14, 2011 2:59 am 
Marvin HumphreyNov 14, 2011 5:14 am 
goran kentNov 14, 2011 6:42 am 
Nathan KurzNov 14, 2011 2:58 pm 
goran kentNov 17, 2011 12:37 am 
Marvin HumphreyNov 17, 2011 3:24 pm 
goran kentNov 17, 2011 11:50 pm 
goran kentNov 18, 2011 1:19 am 
Marvin HumphreyNov 18, 2011 6:13 am 
goran kentNov 18, 2011 6:36 am 
goran kentNov 18, 2011 10:18 am 
goran kentNov 23, 2011 2:30 am 
goran kentNov 23, 2011 3:24 am 
goran kentNov 23, 2011 3:49 am 
goran kentNov 23, 2011 4:05 am 
Nick WellnhoferNov 23, 2011 4:27 am 
goran kentNov 23, 2011 4:30 am 
Marvin HumphreyNov 23, 2011 4:40 am 
goran kentNov 23, 2011 4:51 am 
goran kentNov 23, 2011 4:59 am 
goran kentNov 23, 2011 5:55 am 
Marvin HumphreyNov 23, 2011 12:34 pm 
goran kentNov 24, 2011 12:50 am 
Nick WellnhoferNov 24, 2011 2:16 am 
goran kentNov 24, 2011 2:49 am 
goran kentNov 24, 2011 3:57 am 
Marvin HumphreyNov 26, 2011 10:23 am 
Subject:Re: [lucy-user] Concurrent searching
From:Marvin Humphrey (mar@rectangular.com)
Date:Nov 23, 2011 4:40:48 am
List:org.apache.lucene.lucy-user

On Wed, Nov 23, 2011 at 01:25:09PM +0200, goran kent wrote:

Something is weird with the length for the top_docs packet.

In SearchServer::serve, ~line 106, the confess is chucking a null error because $check_val != $len, hence the meaningless error:

" at
/usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi/LucyX/Remote/SearchServer.pm line 106"

In ClusterSearcher::_serialize_request for top_docs length($serialized)==6959, but SearchServer::serve is receiving length==2892.

So, that's why SearchServer is failing. What's causing the short send (or receive, or pack/unpack not co-operating across machines) will hopefully soon be revealed.

As we move away from blocking i/o, we need to manage buffers manually and be prepared for partial success. (Eventually we need to deal with timeouts and failovers, because otherwise the system remains vulnerable to its weakest link and hangs when a single node goes down -- but that's for later.)

Suggested patch:

- confess $! unless $check_val == $len; + confess "packet length mismatch: $!" unless $check_val == $len;

Those confess() calls are placeholders, to be swapped out at some future time with a less aggressive error reporting mechanism that does not take down the server process. The idea was to use confess() during early rapid prototyping to flag each place a system call return value needs to be checked.

In some cases, including here, the code also needs to be refactored around non-blocking i/o. What we ultimately need to do is accept a partial read, store the incomplete buffer, and return to waiting for the next ready socket. The code will become more complicated because we'll have to keep multiple buffers alive, but that's concurrency for ya.

For now though, try this:

* Change every sysread() to read(), and every syswrite() to write(). * Set $socket->autoflush(1); * Make sure 'Blocking => 0' is commented out. * Replace the select() loop with a "for" loop, because select() and blocking i/o don't mix.

What I'm hoping to do with those changes is return to forcing every socket communication to block, restoring predictable program execution order.