Install SGE on Ubuntu 16.04
0. Install some package:
deimos@deimos:~$ sudo apt-get install xfs*
deimos@deimos:~$ sudo apt-get install libguestfs-xfs
deimos@deimos:~$ sudo apt-get install xfonts-75dpi xfonts-100dpi
deimos@deimos:~$ sudo apt-get install t1-xfree86-nonfree ttf-xfree86-nonfree ttf-xfree86-non
deimos@deimos:~$ xset +fp /usr/share/fonts/X11/75dpi
deimos@deimos:~$ xset fp rehash
1. Install SGE on master node:
deimos@deimos:~$ sudo apt-get install gridengine-client gridengine-common gridengine-master gridengine-qmon gridengine-exec- Remove gridengine-exec from the list if master node is not supposed to run jobs
- During the installation, we need to set the cluster CELL name (such as 'default')
2. Install SGE on other nodes:
deimos@deimos:~$ sudo apt-get install gridengine-client gridengine-exec- The CELL name is set the same as that of the master node
However, you can install all in the same machine (master and client), but you must set:
deimos@deimos:~$ sudo apt-get install gridengine*
3. Set SGE_ROOT and SGE_CELL environment variables:
We need edit three files /etc/profile, /etc/bash.bashrc and ~/.bashrc, and add the following two lines:
- export SGE_CELL=default
Set this line if it is necessary:
- export SGE_ROOT='path of SGE'
4. Set domain:
Edit /etc/hosts with your domain
5. Configure SGE
We need launch qmon as superuser:deimos@deimos:~$ sudo qmon
- Host Configuration => Administration Host => add master node and other administrative nodes
- Host Configuration => Submit Host => add master node and other submitted nodes
- Host Configuration => Execution Host => add slave nodes
- Click Done
5.2 Configure user
In this configuration, you can add or delete users that are allowed to access SGE.
- User Configuration => Userset => Highlight userset 'arusers' and click on 'Modify' => input user name in 'User/Group' field
- Click Done
5.3 Configure queue
Queue Control define ways to connect hosts and users.
- Queue Control => Hosts => Confirm the execution hosts show up there.
- Queue Control" => Cluster Queues => Click on "Add" => Name the queue, add execution nodes to Hostlist
- Use access => allow access to user group arusers;
- General Configuration => Field 'Slots' => Raise the number to total CPU cores on slave nodes (ok to use a bigger number than actual CPU cores).
- Queue Control" => Queue Instances => This is the place to manually assign hosts to queues, and control the state (active, suspend ...) of hosts.
5.4 Configure parallel environnent
- Queue Control => Cluster Queues => select a queue that it will run parallel jobs => click on 'Modify' => 'Parallel Environment' => click on 'PE' below the right and left arrows => click on 'add' => name the PE, slot = 999, start_proc_args = $SGE_ROOT/mpi/startmpi.sh $pe_hostfile, stop_proc_args = $SGE_ROOT/mpi/stopmpi.sh, allocation_rule= $fill_up, check 'Control slaves' to make this variable checked
6. Check SGE hosts
- The system info from all nodes
deimos@deimos:~$ qhost
- The hostnames of nodes
deimos@deimos:~$ qconf -sel
- List the queues
deimos@deimos:~$ qconfi -sql
- Check master daemon
deimos@deimos:~$ ps aux | grep sge_master
- Check execute daemon
deimos@deimos:~$ ps aux | grep sge_execd
- If sge_master or sge_execd is not running, you can start the service
deimos@deimos:~$ sudo service gridengine-exec start
Reference
N1 Grid Engine 6 Administration Guide