Last modified: 2014-02-14 12:48:12 UTC
More and more pages on dewiki shows the error "Fehler beim Parsen(Unbekannter Fehler)" (english: Failed to parse(unknown error)) After a purge, edit or nulledit the error is gone and a png is shown, but it also possible that after a purge the png is gone and the error is shown. MathJax works, only the PNG option is effected. Google also indexed some of that pages: https://www.google.de/#q=%22Fehler+beim+Parsen%22+site:de.wikipedia.org https://www.google.de/#q=%22Failed+to+parse(unknown+error)%22+site:en.wikipedia.org Please have a look. Thanks.
Probably an ops or shell issue?
It's not a thumb (or ops/shell) issue -- the output that I see is e.g. <dl> <dd><strong class='error'>Fehler beim Parsen(Unbekannter Fehler): \sigma\frown\psi:=(-1)^{pq}\psi(\sigma\circ\iota_{0\ldots q})\sigma\circ\iota_{q\ldots p}</strong></dd> </dl> so clearly it never gets to this point.
I quickly retracted that comment of mine on IRC last (European) night when I saw the code :) I little more investigation happened at #wikimedia-tech -- from what I can see on the SAL after I left, Tim found it was a missing cgroups issue on mw1035, mw1145, mw1078, mw1152, mw1150, which means the "texvc" invocation failed. Tim manually fixed the situation apparently, so the effects of this bug shouldn't be still happening, but we need a proper fix in place so this won't happen again in the future. The cgroup issue is recurring (https://gerrit.wikimedia.org/r/#/c/83067/ was the latest attempt to fix it) and we've yet to found an optimal solution. On the plus side, there was a couple of additions on the logging side because of this...
Thanks Faidon! For the records: Topic also brought up on https://de.wikipedia.org/w/index.php?title=Portal_Diskussion:Mathematik&oldid=123471154#r.C3.A4tselhafte_tex-fehler
(In reply to comment #3) > I quickly retracted that comment of mine on IRC last (European) night when I > saw the code :) I little more investigation happened at #wikimedia-tech -- > from what I can see on the SAL after I left, Tim found it was a missing > cgroups > issue on mw1035, mw1145, mw1078, mw1152, mw1150, which means the "texvc" > invocation failed. Actually, pretty much all of the apaches had the issue. Those 5 apaches didn't even have the cgroup filesystem mounted, indicating that cgconfig hadn't been started. On the rest of the apaches, cgconfig had been started but not mw-cgroup. When this bug occurred just now on mw1109, I ran "initctl log-priority debug" then stopped and started cgconfig a couple of times. The logs showed the started/stopped events going to cgred, but not to mw-cgroup. When I edited mw-cgroup.conf with an irrelevant change, the syslog showed a configuration reload due to inotify, and after that, mw-cgroup was started and stopped as expected. So my suspicion is that at some point, something went wrong with cgconfig or mw-cgroup or both, which, in combination with a bug in upstart, caused mw-cgroup to stop receiving events.
379 out of 391 apaches show the old config when you run "initctl show-config mw-cgroup", i.e. without the "stop on" trigger. This is apparently because upstart config changes only take effect after the job is stopped or restarted. The stop script of mw-cgroup will routinely fail since rmdir() fails with EBUSY if there are any tasks remaining in the cgroup. The cgdelete command can be used instead -- it attempts to move any tasks in the cgroup to the parent cgroup before executing the rmdir(). However, this is still prone to failure if new tasks are added to the cgroup while the cgdelete command is in progress: # cgdelete -r memory:mediawiki cgdelete: cannot remove group 'mediawiki': Device or resource busy cgclear, which is run from the stop script of the cgconfig upstart job, is also prone to failing for the same reason. This would not be a problem if cgconfig and mw-cgroup were started on boot and never touched again, but of course the cgroup-bin package has a prerm trigger which stops the cgconfig job, causing breakage on upgrade. Says mw1109: 2013-09-09 18:41:49 upgrade cgroup-bin 0.37.1-1ubuntu10 0.37.1-1ubuntu10.1 Per usual Debian convention, to fix a bug in cgrulesengd it is necessary to unmount and remount the entire cgroup filesystem.
Change 91115 had a related patch set uploaded by Tim Starling: Improve logging for wfShellExec() https://gerrit.wikimedia.org/r/91115
*** Bug 54367 has been marked as a duplicate of this bug. ***
I suggest migrating Apache to upstart and having it stop if mw-cgroup stops.
Change 91115 merged by jenkins-bot: Improve logging for wfShellExec() and ignore missing cgroup https://gerrit.wikimedia.org/r/91115
[Patch merged; resetting bug status]