Last modified: 2013-07-04 10:38:28 UTC
Hack up libhubbub in a way that makes the tree builder callable from the outside and build a libxml2 DOM from it. A class with a method taking a TokenChunkPtr (or a TokenMessage, but that is easy to adapt) would be ideal. This will likely involve conversion of tokens to the libhubbub format. Because of the arena-like libhubbub memory management strategy actually only a single stack-allocated token is needed. There is a libxml2 binding example in the libhubbub source we could adopt. It does some unnecessary strduping, since libxml implicitly copies its input while constructing the DOM. We will also likely need to add some features to our version of the tree builder. The main feature planned currently is the propagation of attributes from end tags to the resulting element. We will also need a second HTML parser and DOM builder for the HTML to Wikitext conversion. This could be the default (unpatched) libhubbub parser.
Merged Gerrit change #26413 links here, bug maybe resolved
[Parsoid component reorg by merging CPP/* tickets into General. See bug 50685 for more information. Filter bugmail on this comment. parsoidreorg20130704]