Last modified: 2013-11-12 17:10:08 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T57682, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 55682 - Flow: ParsoidUtils does not roundtrip wikitext and HTML
Flow: ParsoidUtils does not roundtrip wikitext and HTML
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
Flow (Other open bugs)
unspecified
All All
: Unprioritized normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on: 55758
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-14 01:32 UTC by Andrew Garrett
Modified: 2013-11-12 17:10 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Andrew Garrett 2013-10-14 01:32:07 UTC
The issue is that it uses the Visual Editor API, which does not use Parsoid for Wikitext->HTML transformation. This means that the HTML does not contain Parsoid markup, so it cannot be safely roundtripped.

The offending revision is c72c0ce45ab0b4312a8b839be47da4d424ada659.

A minimal test case follows:

> $wikitext = '[[Foo bar baz]]'

> print $html = Flow\ParsoidUtils::convert( 'wikitext', 'html', $wikitext )
<p><a href="/mw-dev/index.php?title=Foo_bar_baz&amp;action=edit&amp;redlink=1" class="new" title="Foo bar baz (page does not exist)">Foo bar baz</a>
</p>
> print Flow\ParsoidUtils::convert( 'html', 'wikitext', $html )
<a href="/mw-dev/index.php?title=Foo_bar_baz&action=edit&redlink=1" class="new" title="Foo bar baz (page does not exist)">Foo bar baz</a>

I could fix this by reverting the offending changes, but I wanted to discuss it first to check if we're better off (a) modifying VE; (b) calling VE in a different way; (c) changing ParsoidUtils to call Parsoid directly itself; or (d) Refactoring Parsoid itself so that Parsoid provides the interface to its own API rather than expecting individual extensions to contact it. My preference is actually for (d), but it might require some discussion with VE / Parsoid teams (CC'd)
Comment 1 Gerrit Notification Bot 2013-10-14 01:56:38 UTC
Change 89611 had a related patch set uploaded by Werdna:
Reduce edit form latency and work around bug 55682

https://gerrit.wikimedia.org/r/89611
Comment 2 Andrew Garrett 2013-10-15 02:55:10 UTC
Patch is a workaround only.
Comment 3 Matthias Mullie 2013-10-15 14:29:55 UTC
As far as I can tell, Parsoid has endpoints for:

#1 Regular article parsing (get annotated HTML by submitted page title): GET <parsoid>/en/page-title
#2 Regular article serialization using POST (get wikitext by submitted HTML): POST <parsoid>/en/page-title
#3 Form-based wikitext -> HTML DOM interface for manual testing: POST <parsoid>/_html
#4 Form-based HTML DOM -> wikitext interface for manual testing: <parsoid>/_wikitext


to HTML:

#1 is not something we can use: it will query ApiQueryRevisions.php for the page's content, which only works with pages; not something Flow can hook into
#3 is something we could use, but it is documented to only be for manual testing


to wikitext:

#2 is ok to use (we currently do, via VE's API)
#4 is something we could use, but it is documented to only be for manual testing

--

Flow currently uses VE's API (which did provide a way to do wikitext->HTML, albeit not using Parsoid), though we should probably change that.

Short-term, we could change ParsoidUtils to again call Parsoid's form-based interfaces for manual testing.
In the longer run, we'll need Parsoid to provide another endpoint (see below). I'm indifferent to whether Flow calls it directly or via VE's API. I'd prefer not to do a similar thing twice, but there may be reasons to do so; to be figured out once we pick up VE integration into Flow again.

--

To Parsoid team (I don't know much of Parsoid's inner workings, so I may be mistaken)

In order for Flow to use Parsoid for wikitext->HTML, we'd need Parsoid to provide either:

* A stable (not for testing) endpoint that receives wikitext directly and transforms it to annotated HTML
* A stable endpoint that calls an API based on an identifier (e.g. #1, which recieves a page title), but allows additional parameters to let Flow decide which API should be called (so we can make it return Flow content)

If I have missed any existing functionality that does what we need, please point it out :)
Comment 4 Gerrit Notification Bot 2013-10-15 15:03:09 UTC
Change 89611 merged by jenkins-bot:
Reduce edit form latency and work around bug 55682

https://gerrit.wikimedia.org/r/89611
Comment 5 Erik Bernhardson 2013-10-15 17:28:25 UTC
Another thing to consider, is what the parsoid output from the debug api looks like.  It currently looks like
   
    <!DOCTYPE html>
    <html prefix="dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/"><head prefix="mwr: http://en.wikipedia.org/wiki/Special:Redirect/"><meta property="mw:parsoidVersion" content="0"><link rel="dc:isVersionOf" href="//en.wikipedia.org/wiki/Main page"><title>undefined</title><base href="//en.wikipedia.org/wiki/Main page"></head><body data-parsoid='{"dsr":[0,36,0,0]}'><p data-parsoid='{"dsr":[0,17,0,0]}'><b data-parsoid='{"dsr":[0,17,3,3]}'>hello there</b></p>
    <dl data-parsoid='{"dsr":[18,36,0,0]}'><dd data-parsoid='{"dsr":[18,36,1,0]}'> old
    <dl data-parsoid='{"dsr":[24,36,0,0]}'><dd data-parsoid='{"dsr":[24,36,2,0]}'> travelers</dd></dl></dd></dl></body></html>

Basically, it containing a <head> and  a <base href="..."> both prevent simply dumping the output of parsoid into an html page.  

In order to figure out what we need to do here I'm having lunch with gwicke today  and will see what is our best move going forward.
Comment 6 spage 2013-10-15 19:00:52 UTC
The WMF core features team tracks this bug on Mingle card https://mingle.corp.wikimedia.org/projects/flow/cards/320, but people from the community are welcome to contribute here and in Gerrit.
Comment 7 Gabriel Wicke 2013-10-15 20:58:25 UTC
We'll work on public APIs for page-less HTML and wikitext. This is fairly straightforward on the Parsoid side, especially if you are happy to use Parsoid directly for now. The public API for this might take a bit longer.

The Parsoid side is tracked in bug 55758.
Comment 8 Gerrit Notification Bot 2013-10-30 11:19:33 UTC
Change 92633 had a related patch set uploaded by Matthias Mullie:
Use page-less Parsoid API for Flow

https://gerrit.wikimedia.org/r/92633
Comment 9 Gerrit Notification Bot 2013-11-07 23:18:11 UTC
Change 92633 merged by jenkins-bot:
Use page-less Parsoid API for Flow

https://gerrit.wikimedia.org/r/92633

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links