Last modified: 2013-11-12 17:10:08 UTC
The issue is that it uses the Visual Editor API, which does not use Parsoid for Wikitext->HTML transformation. This means that the HTML does not contain Parsoid markup, so it cannot be safely roundtripped. The offending revision is c72c0ce45ab0b4312a8b839be47da4d424ada659. A minimal test case follows: > $wikitext = '[[Foo bar baz]]' > print $html = Flow\ParsoidUtils::convert( 'wikitext', 'html', $wikitext ) <p><a href="/mw-dev/index.php?title=Foo_bar_baz&action=edit&redlink=1" class="new" title="Foo bar baz (page does not exist)">Foo bar baz</a> </p> > print Flow\ParsoidUtils::convert( 'html', 'wikitext', $html ) <a href="/mw-dev/index.php?title=Foo_bar_baz&action=edit&redlink=1" class="new" title="Foo bar baz (page does not exist)">Foo bar baz</a> I could fix this by reverting the offending changes, but I wanted to discuss it first to check if we're better off (a) modifying VE; (b) calling VE in a different way; (c) changing ParsoidUtils to call Parsoid directly itself; or (d) Refactoring Parsoid itself so that Parsoid provides the interface to its own API rather than expecting individual extensions to contact it. My preference is actually for (d), but it might require some discussion with VE / Parsoid teams (CC'd)
Change 89611 had a related patch set uploaded by Werdna: Reduce edit form latency and work around bug 55682 https://gerrit.wikimedia.org/r/89611
Patch is a workaround only.
As far as I can tell, Parsoid has endpoints for: #1 Regular article parsing (get annotated HTML by submitted page title): GET <parsoid>/en/page-title #2 Regular article serialization using POST (get wikitext by submitted HTML): POST <parsoid>/en/page-title #3 Form-based wikitext -> HTML DOM interface for manual testing: POST <parsoid>/_html #4 Form-based HTML DOM -> wikitext interface for manual testing: <parsoid>/_wikitext to HTML: #1 is not something we can use: it will query ApiQueryRevisions.php for the page's content, which only works with pages; not something Flow can hook into #3 is something we could use, but it is documented to only be for manual testing to wikitext: #2 is ok to use (we currently do, via VE's API) #4 is something we could use, but it is documented to only be for manual testing -- Flow currently uses VE's API (which did provide a way to do wikitext->HTML, albeit not using Parsoid), though we should probably change that. Short-term, we could change ParsoidUtils to again call Parsoid's form-based interfaces for manual testing. In the longer run, we'll need Parsoid to provide another endpoint (see below). I'm indifferent to whether Flow calls it directly or via VE's API. I'd prefer not to do a similar thing twice, but there may be reasons to do so; to be figured out once we pick up VE integration into Flow again. -- To Parsoid team (I don't know much of Parsoid's inner workings, so I may be mistaken) In order for Flow to use Parsoid for wikitext->HTML, we'd need Parsoid to provide either: * A stable (not for testing) endpoint that receives wikitext directly and transforms it to annotated HTML * A stable endpoint that calls an API based on an identifier (e.g. #1, which recieves a page title), but allows additional parameters to let Flow decide which API should be called (so we can make it return Flow content) If I have missed any existing functionality that does what we need, please point it out :)
Change 89611 merged by jenkins-bot: Reduce edit form latency and work around bug 55682 https://gerrit.wikimedia.org/r/89611
Another thing to consider, is what the parsoid output from the debug api looks like. It currently looks like <!DOCTYPE html> <html prefix="dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/"><head prefix="mwr: http://en.wikipedia.org/wiki/Special:Redirect/"><meta property="mw:parsoidVersion" content="0"><link rel="dc:isVersionOf" href="//en.wikipedia.org/wiki/Main page"><title>undefined</title><base href="//en.wikipedia.org/wiki/Main page"></head><body data-parsoid='{"dsr":[0,36,0,0]}'><p data-parsoid='{"dsr":[0,17,0,0]}'><b data-parsoid='{"dsr":[0,17,3,3]}'>hello there</b></p> <dl data-parsoid='{"dsr":[18,36,0,0]}'><dd data-parsoid='{"dsr":[18,36,1,0]}'> old <dl data-parsoid='{"dsr":[24,36,0,0]}'><dd data-parsoid='{"dsr":[24,36,2,0]}'> travelers</dd></dl></dd></dl></body></html> Basically, it containing a <head> and a <base href="..."> both prevent simply dumping the output of parsoid into an html page. In order to figure out what we need to do here I'm having lunch with gwicke today and will see what is our best move going forward.
The WMF core features team tracks this bug on Mingle card https://mingle.corp.wikimedia.org/projects/flow/cards/320, but people from the community are welcome to contribute here and in Gerrit.
We'll work on public APIs for page-less HTML and wikitext. This is fairly straightforward on the Parsoid side, especially if you are happy to use Parsoid directly for now. The public API for this might take a bit longer. The Parsoid side is tracked in bug 55758.
Change 92633 had a related patch set uploaded by Matthias Mullie: Use page-less Parsoid API for Flow https://gerrit.wikimedia.org/r/92633
Change 92633 merged by jenkins-bot: Use page-less Parsoid API for Flow https://gerrit.wikimedia.org/r/92633