

![]() | Start a set with this search |
![]() | Include this search in one of my sets |
![]() | Exclude this search from one of my sets |
![]() | Permalink to these results Paste this link in email or IM: |
| Atom feed for tracking future search results Paste this URL into your reader: |
5 messages in net.sunsource.gridengine.users[GE users] JSV scripts running unreli...| From | Sent On | Attachments |
|---|---|---|
| ah_sunsource | Jun 10, 2009 3:15 am | |
| ernst | Jun 10, 2009 4:32 am | |
| ah_sunsource | Jun 10, 2009 6:10 am | .Other |
| ernst | Jun 10, 2009 10:00 am | |
| dougalb | Aug 25, 2009 1:26 am |

![]() | Permalink for this message Paste this link in email or IM: |
![]() | Permalink for this thread Paste this link in email or IM: |
| Atom feed for this thread Paste this URL into your reader: |
| Subject: | [GE users] JSV scripts running unreliably | Actions... |
|---|---|---|
| From: | ah_sunsource (aha...@ifh.de) | |
| Date: | Jun 10, 2009 3:15:10 am | |
| List: | net.sunsource.gridengine.users | |
Hi,
I'm experiencing a bit with the new jsv feature in SGE 6.2u2. I've written a server side jsv that checks whether the user requests at least 256M for h_vmem (below that, the prolog script might die due to missing memory and leaving the queue in an error state).
Unfortunately the jsv feature is not reliable:
[oreade38] ~ % for i in {1..5}; do echo hostname | qsub -l h_vmem=128M done Unable to run job: Do not require less than 256M for h_vmem. Exiting. Unable to run job: Do not require less than 256M for h_vmem. Exiting. Unable to run job: master got unknown command from JSV: "ERROR". Exiting. Unable to run job: master got unknown command from JSV: "ERROR". Exiting. Unable to run job: Do not require less than 256M for h_vmem. Exiting.
On the server logs I see messages like this:
06/10/2009 11:30:35|worker|lolek-vm1|I|JSV modification time in "worker001" has
changed
06/10/2009 11:30:36|worker|lolek-vm1|I|JSV "/usr/gridengine/util/job_verifier"
has been stopped
06/10/2009 11:30:36|worker|lolek-vm1|I|JSV modification time in "worker001" has
changed
06/10/2009 11:30:36|worker|lolek-vm1|I|JSV "/usr/gridengine/util/job_verifier"
has been started
06/10/2009 11:30:37|worker|lolek-vm1|I|JSV "worker001" rejected job 921
06/10/2009 11:30:37|worker|lolek-vm1|I|JSV modification time in "worker000" has
changed
06/10/2009 11:30:37|worker|lolek-vm1|I|JSV modification time in "worker000" has
changed
06/10/2009 11:30:37|worker|lolek-vm1|I|JSV "/usr/gridengine/util/job_verifier"
has been started
06/10/2009 11:30:37|worker|lolek-vm1|I|JSV "worker000" rejected job 922
06/10/2009 11:30:37|worker|lolek-vm1|I|JSV "worker001" rejected job 923
06/10/2009 11:30:37|worker|lolek-vm1|I|JSV "worker001" will be restarted.
06/10/2009 11:30:38|worker|lolek-vm1|I|JSV "/usr/gridengine/util/job_verifier"
has been stopped
06/10/2009 11:30:38|worker|lolek-vm1|I|JSV "worker000" rejected job 924
06/10/2009 11:30:38|worker|lolek-vm1|I|JSV "worker000" will be restarted.
06/10/2009 11:30:39|worker|lolek-vm1|I|JSV "/usr/gridengine/util/job_verifier"
has been stopped
06/10/2009 11:30:39|worker|lolek-vm1|I|JSV "/usr/gridengine/util/job_verifier"
has been started
06/10/2009 11:30:40|worker|lolek-vm1|I|JSV "worker001" rejected job 925
Looks like the success of the script is oscillating. Is it be a bug?
Cheers, Andreas
-- | Andreas Haupt | E-Mail: andr...@desy.de | DESY Zeuthen | WWW: http://www-zeuthen.desy.de/~ahaupt | Platanenallee 6 | Phone: +49/33762/7-7359 | D-15738 Zeuthen | Fax: +49/33762/7-7216
-- | Andreas Haupt | E-Mail: andr...@desy.de | DESY Zeuthen | WWW: http://www-zeuthen.desy.de/~ahaupt | Platanenallee 6 | Phone: +49/33762/7-7359 | D-15738 Zeuthen | Fax: +49/33762/7-7216
------------------------------------------------------ http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=201408
To unsubscribe from this discussion, e-mail:
[user...@gridengine.sunsource.net].








.Other