Last modified: 2014-04-22 19:36:44 UTC
The webservice for my "commonshelper" tool is running, but I can't load the web page(s). Examples: http://tools.wmflabs.org/commonshelper/index.php (tool) http://tools.wmflabs.org/commonshelper/index_test.php (simple test page) The pages are just loading "forver". * removed access.log as per https://bugzilla.wikimedia.org/show_bug.cgi?id=58931 * did webservice restart * did webservice stop / webservice start I got similar bug reports for multiple tools since yesterday, which were resolved with restarting the web service, but apparently not this one.
Related to http://lists.wikimedia.org/pipermail/labs-l/2014-April/002305.html ?
I've seen this problem before. lighttpd webservice stops, but old php-cgi processes remain. $webservice start then starts /one/ lighhtpd process, but can't start new php-cgi's. So plain html or py is served just fine, while php requests are "stuck". This is the output from webgrid for commonshelper: tools-webgrid-01: (13:51:40) 608 tools.co 20 0 48668 2116 1312 S 0 0.0 0:00.03 lighttpd 11144 tools.co 20 0 281m 11m 7748 S 0 0.1 0:00.03 php-cgi 11146 tools.co 20 0 288m 11m 4764 S 0 0.1 2:04.32 php-cgi 11147 tools.co 20 0 288m 11m 4680 S 0 0.1 0:36.61 php-cgi 11148 tools.co 20 0 288m 11m 4760 S 0 0.1 1:29.41 php-cgi 11149 tools.co 20 0 288m 11m 4756 S 0 0.1 2:42.74 php-cgi tools-webgrid-02: (13:51:40) 19567 tools.co 20 0 281m 11m 7764 S 0 0.1 0:00.01 php-cgi 19575 tools.co 20 0 283m 9844 4320 S 0 0.1 0:35.07 php-cgi 19576 tools.co 20 0 283m 9836 4312 S 0 0.1 0:01.24 php-cgi 19577 tools.co 20 0 283m 9912 4272 S 0 0.1 0:34.99 php-cgi 19578 tools.co 20 0 283m 9796 4272 S 0 0.1 0:35.88 php-cgi I figured out this workaround. Make this a script & execute: #!/bin/bash webservice stop sleep 5 ssh tools-webgrid-01 'pkill -9 -U tools.commonshelper php-cgi' ssh tools-webgrid-02 'pkill -9 -U tools.commonshelper php-cgi' sleep 5 webservice start
metatron is correct; I recently had to purge some old processes (cf. [[wikitech:Nova Resource:Tools/SAL#April 10]]). To fix Magnus' issue, I killed the blocking php-cgi processes; the tool should be working again. The underlying problem is that "webservice stop" uses qdel which by default uses SIGKILL. That kills the lighttpd process and its workers, but not the spawned php-cgi processes. Testing shows that on SIGTERM lighttpd correctly ends its workers and the spawned php-cgi processes. I recently filed bug #61102 to use SIGTERM for the general case of jsub; the same logic applies to this bug as well.
Thanks Tim, metatron, it works again!
It works for now :-), but the general problem hasn't been solved yet.
Ha! I knew I had jotted down something about the problem earlier. *** This bug has been marked as a duplicate of bug 63878 ***