Below is the output from installing SGE via apt-get:
Creating config file /etc/default/gridengine with new version Setting up gridengine-master (6.2u5-1ubuntu1) ... Initializing cluster with the following parameters: => SGE_ROOT: /var/lib/gridengine => SGE_CELL: default => Spool directory: /var/spool/gridengine/spooldb => Initial manager user: sgeadmin Initializing spool (/var/spool/gridengine/spooldb) Initializing global configuration based on /usr/share/gridengine/default-configuration Initializing complexes based on /usr/share/gridengine/centry Initializing usersets based on /usr/share/gridengine/usersets Adding user sgeadmin as a manager Cluster creation complete Setting up libxp6 (1:1.0.0.xsf1-2build1) ... Setting up lesstif2 (1:0.95.2-1) ... Setting up gridengine-qmon (6.2u5-1ubuntu1) ... Processing triggers for libc-bin ... ldconfig deferred processing now taking place
When running "qstat-f" on exec host, it complains:
error: commlib error: got select error (No route to host) error: unable to contact qmaster using port 6444 on host "pwbclinuxlab.garvan.unsw.edu.au"
pwbclinuxlab.garvan.unsw.edu.au is the ex-qmaster which has been removed. Even in the new qmaster, add the exec host again, the exec host still remember the old one. Because it's the string hardcoded in:
/var/lib/gridengine/default/common/act_qmaster
When running "qstat-f" on exec host, it has another complain...
error: commlib error: access denied (client IP resolved to host name "". This is not identical to clients host name "") error: unable to contact qmaster using port 6444 on host "sgeqmast01.garvan.unsw.edu.au"
Read Things to think about before installing Grid Engine.
This is very likely related to DNS. SGE requires both forward and reverse DNS queries. So make sure DNS server has been setup properly. In case of DNS server setup is too difficult, adding proper entries to /etc/hosts will fix the issue.
Remember: qmaster's hosts file must contain all SGE hosts' (qsub host, qexec host etc) record. Any other SGE host must contain qmaster's record.
127.0.0.1 localhost 129.94.136.232 sgeqexec01.garvan.unsw.edu.au sgeqexec01 129.94.136.230 sgeqmast01.garvan.unsw.edu.au sgeqmast01
No comments:
Post a Comment