Last modified: 2014-02-12 23:40:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T44351, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 42351 - Extra whitespaces are added in HTML-like stuff
Extra whitespaces are added in HTML-like stuff
Status: NEW
Product: Parsoid
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Low normal
: ---
Assigned To: Gabriel Wicke
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-22 09:36 UTC by Liangent
Modified: 2014-02-12 23:40 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Liangent 2012-11-22 09:36:36 UTC
Try with leading space:

 1<a
 b>3
Comment 1 Gabriel Wicke 2012-12-06 01:32:28 UTC
Confirmed as a syntactic diff. Lower priority for now.
Comment 2 Liangent 2012-12-06 04:22:20 UTC
No this is not a syntactic diff. The first whitespace triggers <pre> so any following whitespaces are displayed.
Comment 3 Liangent 2012-12-06 04:23:17 UTC
.
Comment 4 Gabriel Wicke 2012-12-06 04:38:12 UTC
Yep, you are right. An extra space inside a pre renders differently, so changes the semantics.
Comment 5 Gabriel Wicke 2013-01-28 23:26:38 UTC
Current status:

echo -e ' 1<a\n b>3' | nodejs parse --wt2wt
*********** ERROR: cs/s mismatch for node: PRE s: 1; cs: 0 ************
 1<a
  b>3

Notice the extra space before b>3.
Comment 6 ssastry 2013-03-16 00:08:42 UTC
This bug has been fixed for a subset of snippets.

[subbu@earth tests] echo ' 1<c\n b>3' | node parse --wt2wt
 1<c
 b>3
[subbu@earth tests] echo ' 1<a\n b>3' | node parse --wt2wt
WARNING: DSR inconsistency: cs/s mismatch for node: PRE s: 1; cs: 0
 1<a
  b>3

In the first case, 'c' is not a valid HTML tag and so it immediately gets converted to text and is handled properly.  But 'a' is a valid HTML tag and is not converted to text till it hits the saniter, too late for it to be processed by the pre-handler which runs before the sanitizer.
Comment 7 ssastry 2013-03-16 00:54:51 UTC
Moving the pre-handler after the sanitizer alongwith some minor tweaks to the sanitizer code (Util.newlinesToNlTks(token-to-text)) fixes this. Not committing yet since this needs to be thought through, tested, and verified.  But, recording this while the experiment is fresh in my mind in case we cannot get around to this right away.
Comment 8 Gabriel Wicke 2013-06-29 02:08:59 UTC
Lowering priority as this is mostly working, and not a very common thing.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links