|goran kent||Nov 14, 2011 2:59 am|
|Marvin Humphrey||Nov 14, 2011 5:14 am|
|goran kent||Nov 14, 2011 6:42 am|
|Nathan Kurz||Nov 14, 2011 2:58 pm|
|goran kent||Nov 17, 2011 12:37 am|
|Marvin Humphrey||Nov 17, 2011 3:24 pm|
|goran kent||Nov 17, 2011 11:50 pm|
|goran kent||Nov 18, 2011 1:19 am|
|Marvin Humphrey||Nov 18, 2011 6:13 am|
|goran kent||Nov 18, 2011 6:36 am|
|goran kent||Nov 18, 2011 10:18 am|
|goran kent||Nov 23, 2011 2:30 am|
|goran kent||Nov 23, 2011 3:24 am|
|goran kent||Nov 23, 2011 3:49 am|
|goran kent||Nov 23, 2011 4:05 am|
|Nick Wellnhofer||Nov 23, 2011 4:27 am|
|goran kent||Nov 23, 2011 4:30 am|
|Marvin Humphrey||Nov 23, 2011 4:40 am|
|goran kent||Nov 23, 2011 4:51 am|
|goran kent||Nov 23, 2011 4:59 am|
|goran kent||Nov 23, 2011 5:55 am|
|Marvin Humphrey||Nov 23, 2011 12:34 pm|
|goran kent||Nov 24, 2011 12:50 am|
|Nick Wellnhofer||Nov 24, 2011 2:16 am|
|goran kent||Nov 24, 2011 2:49 am|
|goran kent||Nov 24, 2011 3:57 am|
|Marvin Humphrey||Nov 26, 2011 10:23 am|
|Subject:||Re: [lucy-user] Concurrent searching|
|From:||Marvin Humphrey (mar...@rectangular.com)|
|Date:||Nov 23, 2011 4:40:48 am|
On Wed, Nov 23, 2011 at 01:25:09PM +0200, goran kent wrote:
Something is weird with the length for the top_docs packet.
In SearchServer::serve, ~line 106, the confess is chucking a null error because $check_val != $len, hence the meaningless error:
/usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi/LucyX/Remote/SearchServer.pm line 106"
In ClusterSearcher::_serialize_request for top_docs length($serialized)==6959, but SearchServer::serve is receiving length==2892.
So, that's why SearchServer is failing. What's causing the short send (or receive, or pack/unpack not co-operating across machines) will hopefully soon be revealed.
As we move away from blocking i/o, we need to manage buffers manually and be prepared for partial success. (Eventually we need to deal with timeouts and failovers, because otherwise the system remains vulnerable to its weakest link and hangs when a single node goes down -- but that's for later.)
- confess $! unless $check_val == $len; + confess "packet length mismatch: $!" unless $check_val == $len;
Those confess() calls are placeholders, to be swapped out at some future time with a less aggressive error reporting mechanism that does not take down the server process. The idea was to use confess() during early rapid prototyping to flag each place a system call return value needs to be checked.
In some cases, including here, the code also needs to be refactored around non-blocking i/o. What we ultimately need to do is accept a partial read, store the incomplete buffer, and return to waiting for the next ready socket. The code will become more complicated because we'll have to keep multiple buffers alive, but that's concurrency for ya.
For now though, try this:
* Change every sysread() to read(), and every syswrite() to write(). * Set $socket->autoflush(1); * Make sure 'Blocking => 0' is commented out. * Replace the select() loop with a "for" loop, because select() and blocking i/o don't mix.
What I'm hoping to do with those changes is return to forcing every socket communication to block, restoring predictable program execution order.