Last modified: 2012-06-11 20:31:08 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T39071, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 37071 - jobs-loop sometime can not find PHP script
jobs-loop sometime can not find PHP script
Status: RESOLVED WORKSFORME
Product: Wikimedia Labs
Classification: Unclassified
deployment-prep (beta) (Other open bugs)
unspecified
All All
: Low normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on: 36646
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-24 07:12 UTC by Antoine "hashar" Musso (WMF)
Modified: 2012-06-11 20:31 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Antoine "hashar" Musso (WMF) 2012-05-24 07:12:47 UTC
Seen on job-runner03 :

Main loop:
 /bin/bash /usr/local/apache/common/php/extensions/WikimediaMaintenance/jobs-loop.sh

A child:
 \_ php MWScript.php runJobs.php --wiki=The MediaWiki script file "./php-trunk/maintenance/nextJobDB.php" does not exist. --
Comment 1 Antoine "hashar" Musso (WMF) 2012-05-24 07:12:57 UTC
$ ll /usr/local/apache/
ls: /usr/local/apache/: Input/output error

Sounds bad ;-D
Comment 2 Antoine "hashar" Musso (WMF) 2012-05-25 08:20:52 UTC
Looking at the process information for jobs-loop.sh , I found out that the `cwd` pointed to a deleted path:

$ ls -l /proc/1234/cwd
lrwxrwxrwx 1 apache apache 0 2012-05-25 08:16 cwd -> /usr/local/apache/common-local/multiversion (deleted)

Although the directory is actually there :-(
Comment 3 Antoine "hashar" Musso (WMF) 2012-05-25 08:27:59 UTC
Restarting loop ( /etc/init.d/mw-job-runner ), seems to fix the link:

# ls -l /proc/6973/cwd
lrwxrwxrwx 1 apache apache 0 2012-05-25 08:24 /proc/6973/cwd -> /usr/local/apache/common-local/multiversion/
# 

/usr/local/apache being a NFS mount :

deployment-nfs-memc:/mnt/export/apache on /usr/local/apache type nfs (rw,bg,soft,tcp,timeo=14,intr,nfsvers=3,addr=10.4.0.58)


I have no idea what could make it unliked. Maybe the NFS server move the directory somehow or whenever NFS has a connection issue the jobrunner servers considers the file unaccessible permanently.


I am marking 36646 - "get rid of NFS" as a dependency.
Comment 4 Faidon Liambotis 2012-05-25 11:47:58 UTC
Are you sure you haven't deleted and recreated the directory since the process was started? If yes & it happens again, don't restart the process and notify me, I'd like to have a look.
Comment 5 Antoine "hashar" Musso (WMF) 2012-05-29 07:51:56 UTC
Lowering priority, I have not seen that occurrence I guess. Most probably someone renamed, altered the path.

I guess we can close the bug if it does not occur anymore over then next week or so.
Comment 6 Antoine "hashar" Musso (WMF) 2012-06-11 20:31:08 UTC
Was some transient issue I have not reproduced seen reproduced so far. So I am just closing this bug and will reopen it later on if it occurs again.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links