atom feed4 messages in net.sunsource.gridengine.usersRe: [GE users] commlib error during i...
FromSent OnAttachments
John ColdrickApr 30, 2007 10:58 am 
John ColdrickApr 30, 2007 11:08 am 
Rayson HoApr 30, 2007 11:14 am 
John HearnsApr 30, 2007 11:15 am 
Subject:Re: [GE users] commlib error during install_execd
From:John Coldrick (
Date:Apr 30, 2007 11:08:50 am

*sigh* apologies. It seems like there's standard ports now for SGE in /etc/services, and I've been using custom ones on the other machines.

Converting them over to the older, custom ports works.

Of course, I had to post the question to have the answer occur to me. :)



We got a couple of new systems recently that we're trying to get running on the grid. They are dual-quad core Intels, I've installed SUSE 10.2 on them(we're all 10.0 here otherwise), firewalls are disabled, and networking including NFS is running fine. The only problem we're having is that SGE seems to be having trouble talking over port 6444. When trying to run install_execd or even just running utilities like "qhost", I get:

error: commlib error: can't connect to service (Connection refused) unable to contact qmaster using port 6444 on host ""

where ricki is our qmaster and is up and running fine.

I can ssh between the two systems, I've set up the sgeadmin account, in short, all the stuff I usually go through has been done, and this won't work. The only odd thing out is SUSE 10.2, since I've never installed it before.

It's worth mentioning I got this same behaviour with 6.0u6, so I did a completely clean install of 6.0u10 and I'm getting the same problem, so I doubt it's a corrupt install. Note that previous and current SGE installs work fine with all our other systems. I get no error messages in the logs that I can see.

This one's stumping me. Anyone have any idea where I should look first?



----------------------------------------------------------------------- "He flung himself on his horse and rode madly off in all directions"