Friday, October 31, 2014

Platform LSF – Working with Hosts (bhost, lsload)

Taken from LSF Platform Administrative Guide. The Document on bhost and lsload and more information can be taken from Platform - Working with hosts. Although your version of LSF may be different, but the commands can be still use.

Here are some excerpts.....

Host status Host status describes the ability of a host to accept and run batch jobs in terms of daemon states, load levels, and administrative controls. The bhosts and lsload commands display host status.    

1. bhosts Displays the current status of the host
STATUS DESCRIPTION
ok Host is available to accept and run new batch jobs
unavail Host is down, or LIM and sbatchd are unreachable.
unreach LIM is running but sbatchd is unreachable.
closed Host will not accept new jobs. Use bhosts -l to display the reasons.
unlicensed Host does not have a valid license.


2. bhosts -l Displays the closed reasons. A closed host does not accept new batch jobs:
$ bhosts -l
HOST  node001
STATUS           CPUF  JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV DISPATCH_WINDOW
closed_Adm      60.00     -     16      0      0      0      0      0      -

CURRENT LOAD USED FOR SCHEDULING:
r15s   r1m  r15m    ut    pg    io   ls    it   tmp   swp   mem   root maxroot
Total           0.0   0.0   0.0    0%   0.0     0    0 28656  324G   16G   60G  3e+05   4e+05
Reserved        0.0   0.0   0.0    0%   0.0     0    0     0    0M    0M    0M    0.0     0.0

processes clockskew netcard iptotal  cpuhz cachesize diskvolume
Total             404.0       0.0     2.0     2.0 1200.0     2e+04      5e+05
Reserved            0.0       0.0     0.0     0.0    0.0       0.0        0.0

processesroot   ipmi powerconsumption ambienttemp cputemp
Total                 396.0   -1.0             -1.0        -1.0    -1.0
Reserved                0.0    0.0              0.0         0.0     0.0


aa_r aa_r_dy aa_dy_p aa_r_ad aa_r_hpc fluentall fluent fluent_nox
Total         17.0    25.0   128.0    10.0    272.0      48.0   48.0       50.0
Reserved       0.0     0.0     0.0     0.0      0.0       0.0    0.0        0.0

gambit geom_trans tgrid fluent_par
Total           50.0       50.0  50.0      193.0
Reserved         0.0        0.0   0.0        0.0


3. bhosts -X Condensed host groups in an condensed format
$ bhosts -X
HOST_NAME          STATUS       JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV
comp027            ok              -     16      0      0      0      0      0
comp028            ok              -     16      0      0      0      0      0
comp029            ok              -     16      0      0      0      0      0
comp030            ok              -     16      0      0      0      0      0
comp031            ok              -     16      0      0      0      0      0
comp032            ok              -     16      0      0      0      0      0
comp033            ok              -     16      0      0      0      0      0


4. bhosts -l hostID Display all information about specific server host such as the CPU factor and the load thresholds to start, suspend, and resume jobs
# bhosts -l comp067
HOST  comp067
STATUS           CPUF  JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV DISPATCH_WINDOW
ok              60.00     -     16      0      0      0      0      0      -

CURRENT LOAD USED FOR SCHEDULING:
r15s   r1m  r15m    ut    pg    io   ls    it   tmp   swp   mem   root maxroot
Total           0.0   0.0   0.0    0%   0.0     0    0 13032  324G   16G   60G  3e+05   4e+05
Reserved        0.0   0.0   0.0    0%   0.0     0    0     0    0M    0M    0M    0.0     0.0

processes clockskew netcard iptotal  cpuhz cachesize diskvolume
Total             406.0       0.0     2.0     2.0 1200.0     2e+04      5e+05
Reserved            0.0       0.0     0.0     0.0    0.0       0.0        0.0

processesroot   ipmi powerconsumption ambienttemp cputemp
Total                 399.0   -1.0             -1.0        -1.0    -1.0
Reserved                0.0    0.0              0.0         0.0     0.0

aa_r aa_r_dy aa_dy_p aa_r_ad aa_r_hpc fluentall fluent fluent_nox
Total         18.0    25.0   128.0    10.0    272.0      47.0   47.0       50.0
Reserved       0.0     0.0     0.0     0.0      0.0       0.0    0.0        0.0

gambit geom_trans tgrid fluent_par
Total           50.0       50.0  50.0      193.0
Reserved         0.0        0.0   0.0        0.0

LOAD THRESHOLD USED FOR SCHEDULING:
r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
loadSched   -     -     -     -       -     -    -     -     -      -      -
loadStop    -     -     -     -       -     -    -     -     -      -      -

root maxroot processes clockskew netcard iptotal   cpuhz cachesize
loadSched     -       -         -         -       -       -       -         -
loadStop      -       -         -         -       -       -       -         -

diskvolume processesroot    ipmi powerconsumption ambienttemp cputemp
loadSched        -             -       -                -           -       -
loadStop         -             -       -                -           -       -


5. lsload Displays the current state of the host:
STATUS DESCRIPTION
ok Host is available to accept and run batch jobs and remote tasks.
-ok LIM is running but RES is unreachable.
busy Does not affect batch jobs, only used for remote task placement (i.e., lsrun). The value of a load index exceeded a threshold (configured in lsf.cluster.cluster_name, displayed by lshosts -l). Indices that exceed thresholds are identified with an asterisk (*).
lockW Does not affect batch jobs, only used for remote task placement (i.e., lsrun). Host is locked by a run window (configured in lsf.cluster.cluster_name, displayed by lshosts -l).
lockU Will not accept new batch jobs or remote tasks. An LSF administrator or root explicitly locked the host using lsadmin limlock, or an exclusive batch job (bsub -x) is running on the host. Running jobs are not affected. Use lsadmin limunlock to unlock LIM on the local host.
unavail Host is down, or LIM is unavailable.
unlicensed The host does not have a valid license.


6. References:
  1. Platform - Working with hosts

No comments: