Last modified: 2012-11-05 20:14:14 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T43715, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 41715 - Parsoid "should" eat whitespace before [[Category:]] tags
Parsoid "should" eat whitespace before [[Category:]] tags
Status: RESOLVED FIXED
Product: Parsoid
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Normal normal
: ---
Assigned To: Gabriel Wicke
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-02 22:27 UTC by Roan Kattouw
Modified: 2012-11-05 20:14 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Roan Kattouw 2012-11-02 22:27:56 UTC
For compatibility with the PHP parser, I would like Parsoid to mimic this completely insane behavior: all whitespace preceding [[Category:Foo]] is eaten.

So for instance, "Foo [[Category:Bar]]Baz" renders as "<p>FooBaz</p>", "Foo\n\n\n\n[[Category:Bar]]Baz" also renders as "<p>FooBaz</p>", and "Foo\n\n\n\n[[Category:Bar]]\nBaz" renders as "<p>Foo\nBaz</p>". Meanwhile, whitespace *after* categories is processed normally, so "Foo[[Category:Bar]]\n\nBaz" renders as "<p>Foo</p><p>Baz</p>".

I realize this is totally insane behavior, but I'm having problems in VE because Parsoid isn't currently doing this:
1) these cases render differently in the editor than they do in the actual article
2) lists of categories at the end of the page end up as long strings of newline characters in the editor

I could work around this in the editor to some degree, but it's tricky because only categories exhibit this behavior, magic words don't. "Fixing" it in Parsoid would be nicer. Of course, the whitespace that's being stripped would still have to be put in round-trip data and be restored by the serializer.
Comment 1 Gabriel Wicke 2012-11-02 22:33:13 UTC
That whitespace will still be round-tripped in a mw:Placeholder object. Otherwise it is just plain round-tripping.
Comment 2 Gabriel Wicke 2012-11-05 20:14:14 UTC
https://gerrit.wikimedia.org/r/31591 removed the p-wrapping around blocks of category links, but preserves the whitespace. Mixed content with categories is still wrapped in paragraphs. 

We don't currently plan to implement the weirder part of the whitespace-eating behavior as it appears to be a side-effect of paragraph / pre avoidance rather than a use case of its own. This kind of content should be rare enough to not matter.

More fixes are coming for the avoidance of preformatting of indentend category links. These will be tracked in a separate bug, so closing this one as fixed.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links