Last modified: 2014-10-24 22:29:58 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T69554, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 67554 - Parsoid is too aggressive about marking content surrounding templates as template-generated
Parsoid is too aggressive about marking content surrounding templates as temp...
Status: NEW
Product: Parsoid
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Parsoid Team
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-07-05 18:40 UTC by Moriel Schottlender
Modified: 2014-10-24 22:29 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Moriel Schottlender 2014-07-05 18:40:05 UTC
This comes from hewiki, but seems to be unrelated to the language (?)

Go to this page: https://he.wikipedia.org/w/index.php?title=%D7%90%D7%99_%D7%94%D7%97%D7%96%D7%99%D7%A8&oldid=15683898

The first sentence is not part of the original template but it's being marked as if it is in VE.

The template is here: https://he.wikipedia.org/wiki/%D7%AA%D7%91%D7%A0%D7%99%D7%AA:%D7%90%D7%99
Comment 1 Roan Kattouw 2014-10-24 07:57:36 UTC
I have observed this kind of behavior pretty frequently. It appears that if a template outputs a newline at the end (which is common, and easy to do by accident), Parsoid will consider the entire paragraph to be template-generated.
Comment 2 Roan Kattouw 2014-10-24 08:04:30 UTC
In this particular case it looks like there was no newline between the infobox template and the start of the content, and the template caused a paragraph break between it and the content and also inserted a category which for whatever reason was put inside of the paragraph instead of just before it, so the paragraph got marked as template-generated because it contained a category that came from the template.

See DOM of http://parsoid-lb.eqiad.wikimedia.org/hewiki/%D7%90%D7%99_%D7%94%D7%97%D7%96%D7%99%D7%A8?oldid=15683898

(In reply to Roan Kattouw from comment #1)
> I have observed this kind of behavior pretty frequently. It appears that if
> a template outputs a newline at the end (which is common, and easy to do by
> accident), Parsoid will consider the entire paragraph to be
> template-generated.
I take that back, that's only for double newlines. It does happen for single newlines if you're in a list though.
Comment 3 ssastry 2014-10-24 22:08:23 UTC
This is related to how paragraph-wrapping is done in parsoid.

[subbu@earth tests] echo "[[Category:Foo]]abc" | node parse --normalize

<p><link href="Category:Foo"/>abc</p>

I made a number of fixes recently where unrelated content at the extremities like this is left out of paragraphs (https://gerrit.wikimedia.org/r/#/c/166891/ - bug 71361), but looks like I missed some cases.
Comment 4 ssastry 2014-10-24 22:29:58 UTC
WIP here: https://gerrit.wikimedia.org/r/#/c/168710/

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links