Last modified: 2013-10-10 23:08:54 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T55968, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 53968 - Serialize HTML DOM according to polyglot markup spec so that it can be parsed with HTML and XML parsers
Serialize HTML DOM according to polyglot markup spec so that it can be parsed...
Status: RESOLVED FIXED
Product: Parsoid
Classification: Unclassified
DOM (Other open bugs)
unspecified
All All
: High normal
: ---
Assigned To: Gabriel Wicke
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-09 22:56 UTC by Gabriel Wicke
Modified: 2013-10-10 23:08 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Gabriel Wicke 2013-09-09 22:56:43 UTC
To make it easy to process our output using both XML and HTML tools, we should serialize our output to so-called 'polyglot markup'. This means that our output will be valid XML *and* HTML5 at the same time (effectively XHTML).

The spec for polyglot markup is at http://dev.w3.org/html5/html-xhtml-author-guide/. The relevant differences to our current HTML5 serialization should be:

* void elements are serialized with trailing / as in <br/>
* only a small set of named entities is used, other entities are converted to character entities (&nbsp; becomes &#xA0;)

We can add either add this functionality in Domino, or create our own XMLserializer implementation that walks an arbitrary DOM.
Comment 1 Brion Vibber 2013-09-10 19:10:25 UTC
+1 ... HTML5's normalized parsing is great, but lots of programming environments only have an XML parser handy.
Comment 2 Gerrit Notification Bot 2013-10-09 23:52:45 UTC
Change 88904 had a related patch set uploaded by GWicke:
Bug 53968: Add XMLSerializer and use it to produce XHTML

https://gerrit.wikimedia.org/r/88904
Comment 3 Gerrit Notification Bot 2013-10-10 22:48:56 UTC
Change 88904 merged by jenkins-bot:
Bug 53968: Add XMLSerializer and use it to produce XHTML

https://gerrit.wikimedia.org/r/88904

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links