Last modified: 2014-02-27 20:00:06 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T55312, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 53312 - Image captions containing "page " anywhere are parsed as page option
Image captions containing "page " anywhere are parsed as page option
Status: RESOLVED FIXED
Product: Parsoid
Classification: Unclassified
token-stream transforms (Other open bugs)
unspecified
All All
: High normal
: ---
Assigned To: C. Scott Ananian
:
: 54642 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-25 10:13 UTC by Chris McKenna
Modified: 2014-02-27 20:00 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Chris McKenna 2013-08-25 10:13:34 UTC
Any image with caption containing the words "page" and "commentary" in that order in all lower case, whether adjacent or not, and regardless of anything else in the caption are not rendered in VE and are not editable in VE.

See https://en.wikipedia.org/w/index.php?title=User:Thryduulf/sandbox&oldid=570107739#Third_section for just about every possible combination but as a summary:
page commentary: not rendered
A 2013 page with commentary: not rendered
Page commentary: rendered
commentary page: rendered

VE apparently treats these images as if they do not have captions, and adding one simply appends it to the one already there, e.g. adding the caption "Maryland" gave: 
[[File:MDMap-doton-Bowie.PNG|thumb|180px|page 2013 commentary]] → [[File:MDMap-doton-Bowie.PNG|thumb|180px|page 2013 commentary|Maryland]]
https://en.wikipedia.org/w/index.php?title=User%3AThryduulf%2Fsandbox&diff=570108303&oldid=570107739

Note that although I did my testing with images in a table this has no effect, as can be seen on the reported example from the live wiki: [[Josephus on Jesus#Testimonium Flavianum]]
Comment 1 Chris McKenna 2013-08-25 11:32:06 UTC
Further testing shows that the key words are "page " (including the trailing space) and "comment" (not commentary). I think that any image caption that matches the following regex will exhibit this bug:

/.*page .*comment.*/

https://en.wikipedia.org/w/index.php?title=User:Thryduulf/sandbox&oldid=570113406#Third_section
Comment 2 Chris McKenna 2013-08-25 15:06:54 UTC
Yet more testing confirms that this is confined to image captions.

If you edit or enter an image caption in VE with the words "page comment" in you can see the caption in that edit.

Saving and reentering VE exhibits odd behaviour - the caption of the image entered in VE is not visible as I predicted. However the edited caption of an image that was already present remains visible. This is confirmed on a subsequent round trip in and out of VE, but I can't test on other systems / browsers than Firefox 23/Linux

See https://en.wikipedia.org/w/index.php?title=User:Thryduulf/sandbox&oldid=570134743#Section_with_a_picture_of_Gloucester

The image of Gloucester Docks at the head of the section was the one with the edited caption. The image at the end of the preceding section (the prospect of Derby) was added in VE.
Comment 3 Chris McKenna 2013-08-25 23:34:51 UTC
Cryptic C62 at en.wp reports that it's just "page " or "page=" that is required:
"Further further testing shows that the error is caused by "page " (including the space) followed by some text, or "page=" followed by any text or nothing."
Comment 4 Gabriel Wicke 2013-08-26 17:42:38 UTC
The PHP parser recognizes the page option (see https://www.mediawiki.org/wiki/Help:Images#Syntax) only for PDF files that actually exist:

https://www.mediawiki.org/wiki/User:GWicke/TestPageOption

It does however accept the option 'page=2013 commentary'.

So to me it seems that we need to

1) only match 'page=' at the start of the potential option
2) only do so for PDF files that exist
3) but continue to accept mixed numerical / text page values.
Comment 5 C. Scott Ananian 2013-11-08 22:46:24 UTC
Testing reveals that terminating an image caption with 'alt=', 'thumb=', or 'thumbnail=' also trips up the parser.  That seems to be related to the img_attribute production in the peg grammar.  Don't know why 'page ' triggers the bug yet, still looking...
Comment 6 C. Scott Ananian 2013-11-08 22:48:42 UTC
Ah, the img_page option contains two aliases, "page=$1" and "page $1".  I wonder why we are parsing options twice (one in PEG, once in the magic words localization code)...
Comment 7 Gerrit Notification Bot 2013-11-08 23:00:59 UTC
Change 94446 had a related patch set uploaded by Cscott:
Fix parsing of image captions containing embedded image options.

https://gerrit.wikimedia.org/r/94446
Comment 8 Gerrit Notification Bot 2013-11-12 16:48:37 UTC
Change 94446 merged by jenkins-bot:
Fix parsing of image captions containing embedded image options.

https://gerrit.wikimedia.org/r/94446
Comment 9 James Forrester 2014-02-27 20:00:06 UTC
*** Bug 54642 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links