Last modified: 2014-02-12 23:52:28 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T40273, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 38273 - MobileFrontend corrupts parser cache for regular page views
MobileFrontend corrupts parser cache for regular page views
Status: RESOLVED FIXED
Product: MobileFrontend
Classification: Unclassified
stable (Other open bugs)
unspecified
All All
: High major
: ---
Assigned To: Max Semenik
: patch, patch-need-review
: 40121 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-10 08:12 UTC by Michael M.
Modified: 2014-02-12 23:52 UTC (History)
22 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Michael M. 2012-07-10 08:12:46 UTC
Starting at about 4. July there are many (aproximatly one every two days) reports on [[de:Wikipedia:Fragen_zur_Wikipedia]] (the dewiki village pump) about issues obviously caused by badly nested HTML. Purging fixes the issue, but it seems like tidy wasn't executed in these cases. Since this kind of issue occurs now definitly more frequently than before, it should be investigated why tidy refuses to work so often.
Comment 1 Derk-Jan Hartman 2012-07-10 13:40:31 UTC
If this happens, can people use the "View source" feature of their browser and pick and the bottom of it look for "<!-- Served by mw#### in 2.259 secs. -->"  and note the mw#### id before purging the file ?

That will probably help in pinpointing the problem further.
Comment 2 Sam Reed (reedy) 2012-07-10 13:41:50 UTC
(In reply to comment #1)
> If this happens, can people use the "View source" feature of their browser and
> pick and the bottom of it look for "<!-- Served by mw#### in 2.259 secs. -->" 
> and note the mw#### id before purging the file ?
> 
> That will probably help in pinpointing the problem further.

Or <!-- Served by srv#### in 2.259 secs. -->
Comment 3 Sam Reed (reedy) 2012-07-10 13:51:08 UTC
Don't even need to do that, looking at the dpkg output suggests multiple are missing it
Comment 4 Sam Reed (reedy) 2012-07-10 14:11:53 UTC
(In reply to comment #3)
> Don't even need to do that, looking at the dpkg output suggests multiple are
> missing it

bleh, ignore me
Comment 5 Michael M. 2012-07-14 09:01:31 UTC
mw53 just served an untidy html for [[de:Keith Jarrett]]
Comment 6 Derk-Jan Hartman 2012-07-19 09:52:05 UTC
Can someone with shell access do a sanity check on that host please ?
Comment 7 Sam Reed (reedy) 2012-07-23 12:03:42 UTC
reedy@mw53:~$ which tidy
/usr/bin/tidy
reedy@mw53:~$ tidy --version
HTML Tidy for Linux released on 25 March 2009
reedy@mw53:~$ php /usr/local/apache/common-local/multiversion/MWScript.php eval.php enwiki
> echo $wgTidyConf
/usr/local/apache/common-local/php-1.20wmf7/includes/tidy.conf
> echo $wgTidyBin
tidy


Need to check the source for existence of "Tidy was unable to run" or "Tidy found serious XHTML errors"
Comment 8 orlodrim 2012-07-25 15:14:18 UTC
We have a similar problem on frwiki. As far as I know, the first error was reported on 25 June and there are at least 10 reports since this date. Today, I have loaded a page (Richard Feynman) twice. The server was mw6 the first time and srv243 the second time, and I obtained exactly the same incorrect rendering (then I purged the cache and it fixed the problem).
Comment 9 Drongou 2012-07-25 22:09:43 UTC
The </div> of <div id="content" class="mw-body"> is after <!-- Served ...
see : http://imageshack.us/f/801/capturedcran20120720011.png/
Comment 10 Umherirrender 2012-07-29 09:06:30 UTC
Served by mw4 in 0.208 secs. on dewiki
Comment 11 Schniggendiller 2012-07-30 10:04:13 UTC
http://de.wikipedia.org/wiki/DB_City_Night_Line: Served by srv240 in 0.351 secs. (A few days ago.)
Comment 12 Michael M. 2012-08-01 10:13:20 UTC
Served by mw30 in 0.196 secs. ([[de:Galatasaray Istanbul]])

<div clear="all" style="clear:both;" /><br />
<div style="background-color:#888; height:1px; width:8em;"/>

became

<p><br style="clear:both;" clear="all"/>
</br>
</p>
<div style="background-color:#888; height:1px; width:8em;"/>
Comment 13 Michael M. 2012-08-01 10:15:41 UTC
Served by srv229 in 0.118 secs. ([[de:Glosche]])
Served by mw11 in 0.111 secs. ([[de:Toupet]])
Comment 14 Michael M. 2012-08-02 08:15:29 UTC
(In reply to comment #12)
> <div clear="all" style="clear:both;" /><br />
> <div style="background-color:#888; height:1px; width:8em;"/>
> 
> became
> 
> <p><br style="clear:both;" clear="all"/>
> </br>
> </p>
> <div style="background-color:#888; height:1px; width:8em;"/>

Sorry, I wrote nonsense there. That strange
<br style="clear:both;" clear="all"/>
</br>
already was in the wikitext. I didn't notice it, because I use a script to automatically clean up some errors.
Comment 15 Schniggendiller 2012-08-07 18:44:09 UTC
Right now and not yet fixed with purging: http://de.wikipedia.org/wiki/Bundespr%C3%A4sident_%28Deutschland%29: Served by srv260 in 0.155 secs
Comment 16 Umherirrender 2012-08-12 16:51:11 UTC
Increase priority, because HTML tidy is missing more often on pages, at least on de.wp.
Comment 17 Umherirrender 2012-08-20 19:39:18 UTC
HTML tidy is still missing on some pages.

Is there no solution?
Comment 18 orlodrim 2012-08-21 19:28:39 UTC
As a workaround, I replaced <div ... /> by <div ... ></div> in the most used templates on frwiki. I did not see any complain since I did that, 20 days ago. The display of some pages is still broken, but this is virtually invisible.
Comment 19 Redrose64 2012-09-01 23:10:43 UTC
This is still happening on en.wp - on 14 August 2012 mw4 and mw44 both served invalid HTML (see http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_102#Are_the_HTML_generators_out_of_sync_on_some_servers.3F

Today, a few minutes ago, I saw exactly the same problem on a different article: checking, I found that I was again served invalid HTML, this time by srv272

The basic problem concerns two table cells which each contain an unordered list with several items. The last </li> of each list and the </ul> immediately following were not in the proper place, but placed later on: either in between one </th> and the next <td>, or between a </tr> and the </table> following.
Comment 20 Umherirrender 2012-09-04 18:50:17 UTC
This needs timely further investigation, because it breaks many pages on many wikis. Some user get confused, because after a time it is gone away, because someone other purged the page.

Please have a look at this. Thanks.
Comment 21 Bartosz Dziewoński 2012-09-09 20:40:24 UTC
I think this may be a dupe: bug 40121. Interesting thing: the NewPP limit report is missing. Page served by mw15.
Comment 22 Redrose64 2012-09-12 16:45:33 UTC
Same problem seen in the navbox at the bottom of http://en.wikipedia.org/wiki/Operation_Nougat - this was served by mw20. Omitting all attributes, the textual content of enclosures, and all correctly-paired tags which do not enclose bad tags, the mis-ordered tags are:
<table>
  <tr>
    <td>
      <table>
        <tr>
          <td>
            <div>
              <ul>
                <li>
            </div>
          </td>
        </tr>
        <tr>
          <td>
            <div>
                </li>
              </ul>
            </div>
            <table>
              <tr>
                <td>
                  <div>
                    <ul>
                      <li>
                  </div>
                </td>
              </tr>
              <tr>
                <td>
                  <div>
                      </li>
                      <li>
                  </div>
                </td>
              </tr>
            </table>
          </td>
        </tr>
        <tr>
          <td>
            <div>
                      </li>
                      <li>
            </div>
          </td>
        </tr>
        <tr>
          <td>
            <div>
                      </li>
                    </ul>
            </div>
          </td>
        </tr>
      </table>
    </td>
  </tr>
</table>
Comment 23 Michael M. 2012-09-14 08:31:29 UTC
I think I found the problem why Tidy sometimes isn't executed:

./includes/job/RefreshLinksJob.php calls
 ParserOptions::newFromUserAndLang( new User, $wgContLang )
while in other places makeParserOptions from ./includes/WikiPage.php is called, which additionally calls
 $options->enableLimitReport();
 $options->setTidy( true );

This also explains why the limit report is missing.

This means that this bug report is in the wrong component, but since I neither know where it actually belongs to nor how to change both product and component, I'm just leaving this as is.

btw: The structure in the previous comment reminds me a bit of Alice in Wonderland: https://en.wikisource.org/wiki/Alice%27s_Adventures_in_Wonderland/Chapter_3
Comment 24 Krinkle 2012-09-16 14:51:10 UTC
*** Bug 40121 has been marked as a duplicate of this bug. ***
Comment 25 Redrose64 2012-09-26 15:04:00 UTC
Happened again with server mw44. BTW the long and sad tale (which I first read in about 1973) was not forefront - I wanted to illustrate the mismatching by means of indent levels. For indents as deep as twelve levels, tabs are impractical so I used two spaces.
Comment 26 Redrose64 2012-09-27 15:36:40 UTC
A new one: srv200 did this too
Comment 27 Redrose64 2012-09-27 16:51:47 UTC
Also mw29
Comment 28 Bartosz Dziewoński 2012-09-27 17:10:52 UTC
Comment 23 contains what could be a patch. Could somebody competent look at this?

(Updating component and project.)
Comment 29 Max Semenik 2012-09-28 22:18:01 UTC
(In reply to comment #23)
> I think I found the problem why Tidy sometimes isn't executed:
> 
> ./includes/job/RefreshLinksJob.php calls
>  ParserOptions::newFromUserAndLang( new User, $wgContLang )
> while in other places makeParserOptions from ./includes/WikiPage.php is called,
> which additionally calls
>  $options->enableLimitReport();
>  $options->setTidy( true );
> 
> This also explains why the limit report is missing.

RefreshLinksJob doesn't save the results of this parse, so people shouldn't getting these results, and avoiding Tidy calls here makes a lot of sense as the results are used only for link updates.
Comment 30 Michael M. 2012-10-04 07:48:37 UTC
I tried to reproduce the issue by creating a page with wrongly nested syntax and changing linked/transcluded pages, but everything displayed correctly.

But all reports about broken layout that say something about the NewPP limit report mention that it is missing. (Latest report: [[de:Frankfurt (Main) Hauptbahnhof]] served by mw29 and srv264)

So either there is some other place where ParserOptions is created without enabling Tidy and LimitReport, or under some strange circumstances I wasn't able to replicate, RefreshLinksJob does save the result to cache.
Comment 31 MZMcBride 2012-10-05 22:45:19 UTC
Tim, could you please poke at this? This seems like your kind of thing. :-)
Comment 32 Michael M. 2012-10-09 09:26:19 UTC
[[de:Blue October]]: no tidy, no NewPP limit report
Saved in parser cache with key dewiki:pcache:idhash:3450708-0!*!0!!de!4!* and timestamp 20120912181102
Served by srv271 in 0.190 secs
Comment 33 Tim Starling 2012-10-09 23:51:44 UTC
Looks like a MobileFrontend bug. Call stack: 

* require
* ApiMain::execute
* ApiMain::executeActionWithErrorHandling
* ApiMain::executeAction
* ApiMobileView::execute
* ApiMobileView::getData
* WikiPage::getParserOutput
* PoolCounterWork::execute
* PoolWorkArticleView::doWork
* ParserCache::save

ApiMobileView makes a new default ParserOptions, it doesn't get one from Article::getParserOptions() etc. where tidy and limit reports are enabled.
Comment 34 Tim Starling 2012-10-10 01:08:43 UTC
Fix merged: https://gerrit.wikimedia.org/r/#/c/27382/
Comment 35 Max Semenik 2012-10-23 22:14:39 UTC
Was deployed at least a week ago.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links