I suspect it is due to nodes issue. I checked the LSF /opt/lsf/log/sbatchd.log.comp001. It is definitely an authentication issue with AD. I'm using centrify.
acctMapTo: No valid user name found for job 149044, userName(mr_x) failed:Success runEexec: getOSUid_() failed. Bad user ID
I did a
$ badmin hclose comp001
and then restart centrify services. Alternatively, you can reboot if you want a clean start.
The OpenMPI could run again.