Last modified: 2013-10-18 13:18:22 UTC
When I start jobs with jsub, a few of them go into error state ("Eqw"). It seems the execution host can't see my home directory: 1290600 0.25000 job magnus Eqw 10/18/2013 12:54:41 1 magnus@tools-login:/data/project/wikidata-todo/stats/data$ qstat -j 1290600 ============================================================== job_number: 1290600 exec_file: job_scripts/1290600 submission_time: Fri Oct 18 12:54:41 2013 owner: magnus uid: 3067 group: wikidev gid: 500 sge_o_home: /home/magnus sge_o_log_name: magnus sge_o_path: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games sge_o_shell: /bin/bash sge_o_workdir: /data/project/wikidata-todo/stats sge_o_host: tools-login account: sge stderr_path_list: NONE:NONE:/home/magnus/job.err hard resource_list: h_vmem=2097152k,jobs=1 mail_list: magnus@tools.wmflabs.org notify: FALSE job_name: job stdout_path_list: NONE:NONE:/home/magnus/job.out jobshare: 0 hard_queue_list: task env_list: job_args: 20130817,/public/datasets/public/wikidatawiki/20130817/wikidatawiki-20130817-pages-articles.xml.bz2 script_file: /data/project/wikidata-todo/stats/job.sh error reason 1: 10/18/2013 12:54:52 [3067:1335]: error: can't chdir to /home/magnus: No such file or directory scheduling info: Job is in error state
tools-exec-01 had, apparently, rebooted just a bit too early during the NFS transition and tried mounting /home too fast, leaving it in a broken state autofs was unable to recover from. Because /data/project was okay, this was generally not noticable. It has been rebooted and the situation is not okay.
(That said, why are you starting numerous jobs with your user account)?
Too lazy to "become" (plus, "ll" doesn't work as tool, which annoys me;-)
Well, don't do it. :-) Tools run from user accounts can't be salvaged, aren't properly managable, and are subject to being randomly killed. And they're also against the rules. ("ll" is an alias in your .bashrc you can copy to the tools if you miss, by the way).