viernes, 3 de junio de 2016

Install SGE (Sun Grid Engine) on Linux

Install SGE on Ubuntu 16.04

0. Install some package:

deimos@deimos:~$ sudo apt-get install xfs*
deimos@deimos:~$ sudo apt-get install libguestfs-xfs
deimos@deimos:~$ sudo apt-get install xfonts-75dpi xfonts-100dpi
deimos@deimos:~$ sudo apt-get install t1-xfree86-nonfree ttf-xfree86-nonfree ttf-xfree86-non
deimos@deimos:~$ xset +fp /usr/share/fonts/X11/75dpi
deimos@deimos:~$ xset fp rehash


1. Install SGE on master node:

deimos@deimos:~$ sudo apt-get install gridengine-client gridengine-common gridengine-master gridengine-qmon gridengine-exec

  • Remove gridengine-exec from the list if master node is not supposed to run jobs
  • During the installation, we need to set the cluster CELL name (such as 'default')

2. Install SGE on other nodes:

deimos@deimos:~$ sudo apt-get install gridengine-client gridengine-exec


  • The CELL name is set the same as that of the master node

However, you can install all in the same machine (master and client), but you must set:
deimos@deimos:~$ sudo apt-get install gridengine*








3. Set SGE_ROOT and SGE_CELL environment variables:

We need edit three files /etc/profile, /etc/bash.bashrc and ~/.bashrc, and add the following two lines:
  • export SGE_CELL=default
Set this line if it is necessary:
  • export SGE_ROOT='path of SGE'

4. Set domain:

Edit /etc/hosts with your domain


5. Configure SGE

We need launch qmon as superuser:
deimos@deimos:~$ sudo qmon

5.1 Configure host
  • Host Configuration => Administration Host => add master node and other administrative nodes
  • Host Configuration => Submit Host => add master node and other submitted nodes
  • Host Configuration => Execution Host => add slave nodes
  • Click Done
5.2 Configure user
In this configuration, you can add or delete users that are allowed to access SGE.
  • User Configuration => Userset => Highlight userset 'arusers' and click on 'Modify' => input user name in 'User/Group' field
  • Click Done
5.3 Configure queue
Queue Control define ways to connect hosts and users.
  • Queue Control => Hosts => Confirm the execution hosts show up there.
  • Queue Control" => Cluster Queues => Click on "Add" => Name the queue, add execution nodes to Hostlist
  • Use access => allow access to user group arusers;
  • General Configuration => Field 'Slots' => Raise the number to total CPU cores on slave nodes (ok to use a bigger number than actual CPU cores).
  • Queue Control" => Queue Instances => This is the place to manually assign hosts to queues, and control the state (active, suspend ...) of hosts.
5.4 Configure parallel environnent
  • Queue Control => Cluster Queues => select a queue that it will run parallel jobs => click on 'Modify' => 'Parallel Environment' => click on 'PE' below the right and left arrows => click on 'add' => name the PE, slot = 999, start_proc_args = $SGE_ROOT/mpi/startmpi.sh $pe_hostfile, stop_proc_args = $SGE_ROOT/mpi/stopmpi.sh, allocation_rule= $fill_up, check 'Control slaves' to make this variable checked

6. Check SGE hosts

  • The system info from all nodes
deimos@deimos:~$ qhost
  • The hostnames of nodes
deimos@deimos:~$ qconf -sel
  • List the queues
deimos@deimos:~$ qconfi -sql
  • Check master daemon
deimos@deimos:~$ ps aux | grep sge_master
  • Check execute daemon
deimos@deimos:~$ ps aux | grep sge_execd
  • If sge_master or sge_execd is not running, you can start the service
deimos@deimos:~$ sudo service gridengine-master start
deimos@deimos:~$ sudo service gridengine-exec start


Reference
N1 Grid Engine 6 Administration Guide
https://web.njit.edu/topics/HPC/basement/sge/817-5677.pdf