Last modified: 2013-10-18 13:18:22 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T57872, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 55872 - one of the exec hosts can't see my home dir
one of the exec hosts can't see my home dir
Status: RESOLVED FIXED
Product: Wikimedia Labs
Classification: Unclassified
tools (Other open bugs)
unspecified
All All
: Unprioritized major
: ---
Assigned To: Marc A. Pelletier
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-18 12:57 UTC by Magnus Manske
Modified: 2013-10-18 13:18 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Magnus Manske 2013-10-18 12:57:53 UTC
When I start jobs with jsub, a few of them go into error state ("Eqw"). It seems the execution host can't see my home directory:

1290600 0.25000 job        magnus       Eqw   10/18/2013 12:54:41                                    1

magnus@tools-login:/data/project/wikidata-todo/stats/data$ qstat -j 1290600
==============================================================
job_number:                 1290600
exec_file:                  job_scripts/1290600
submission_time:            Fri Oct 18 12:54:41 2013
owner:                      magnus
uid:                        3067
group:                      wikidev
gid:                        500
sge_o_home:                 /home/magnus
sge_o_log_name:             magnus
sge_o_path:                 /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
sge_o_shell:                /bin/bash
sge_o_workdir:              /data/project/wikidata-todo/stats
sge_o_host:                 tools-login
account:                    sge
stderr_path_list:           NONE:NONE:/home/magnus/job.err
hard resource_list:         h_vmem=2097152k,jobs=1
mail_list:                  magnus@tools.wmflabs.org
notify:                     FALSE
job_name:                   job
stdout_path_list:           NONE:NONE:/home/magnus/job.out
jobshare:                   0
hard_queue_list:            task
env_list:
job_args:                   20130817,/public/datasets/public/wikidatawiki/20130817/wikidatawiki-20130817-pages-articles.xml.bz2
script_file:                /data/project/wikidata-todo/stats/job.sh
error reason    1:          10/18/2013 12:54:52 [3067:1335]: error: can't chdir to /home/magnus: No such file or directory
scheduling info:            Job is in error state
Comment 1 Marc A. Pelletier 2013-10-18 13:03:06 UTC
tools-exec-01 had, apparently, rebooted just a bit too early during the NFS transition and tried mounting /home too fast, leaving it in a broken state autofs was unable to recover from.  Because /data/project was okay, this was generally not noticable.

It has been rebooted and the situation is not okay.
Comment 2 Marc A. Pelletier 2013-10-18 13:03:47 UTC
(That said, why are you starting numerous jobs with your user account)?
Comment 3 Magnus Manske 2013-10-18 13:14:27 UTC
Too lazy to "become" (plus, "ll" doesn't work as tool, which annoys me;-)
Comment 4 Marc A. Pelletier 2013-10-18 13:18:22 UTC
Well, don't do it. :-)

Tools run from user accounts can't be salvaged, aren't properly managable, and are subject to being randomly killed.  And they're also against the rules.

("ll" is an alias in your .bashrc you can copy to the tools if you miss, by the way).

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links