Last modified: 2014-11-03 16:16:36 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T50891, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 48891 - <translate> and <tvar> do not work with Parsoid because they are not normal parser tags
<translate> and <tvar> do not work with Parsoid because they are not normal p...
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
Translate (Other open bugs)
unspecified
All All
: Normal major (vote)
: ---
Assigned To: Nobody - You can work on this!
http://www.mediawiki.org/w/api.php?ac...
: newparser
: 52408 (view as bug list)
Depends on:
Blocks: 53974 56451 71730
  Show dependency treegraph
 
Reported: 2013-05-28 12:34 UTC by Derk-Jan Hartman
Modified: 2014-11-03 16:16 UTC (History)
19 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
screenshot (18.89 KB, image/png)
2013-05-28 12:34 UTC, Derk-Jan Hartman
Details

Description Derk-Jan Hartman 2013-05-28 12:34:53 UTC
Created attachment 12405 [details]
screenshot

Example: https://www.mediawiki.org/wiki/Communication
Comment 1 James Forrester 2013-05-28 12:47:28 UTC
Yes, the Translate extension needs to be extended to support VisualEditor editing, but the core issue is that Parsoid doesn't recognise <translate> as an extension block; should it take the local list of registered extensions, and for ones it hasn't been told how to deal with just wrap it as a typeof="mw:Object/Unknown" or something similar?
Comment 2 ssastry 2013-05-28 17:19:29 UTC
Parsoid does fetch a list of registered extensions from the config and uses that to wrap the block with mw:Object/Extension/<ext-name-here> type. All other unknown tags are pass through as plain text.

The problem is that mediawiki does not report <translate> as an installed extension.  Check http://www.mediawiki.org/w/api.php?action=query&meta=siteinfo&format=jsonfm&siprop=extensiontags

Any idea why mw.org is not reporting <translate> as a registered extension?
Comment 3 James Forrester 2013-05-28 18:31:25 UTC
(In reply to comment #2)
> Parsoid does fetch a list of registered extensions from the config and uses
> that to wrap the block with mw:Object/Extension/<ext-name-here> type. All
> other unknown tags are pass through as plain text.

Ah, my apologies; I thought you were doing this, but it didn't even occur to me that the Translate extension might fail to do this. Re-filing.
Comment 4 Niklas Laxström 2013-05-29 06:41:39 UTC
<translate> and <tvar> are not a normal extension tags. Please explain why I should register them.
Comment 5 ssastry 2013-08-01 17:32:41 UTC
Parsoid needs to know if something is a valid extension tag so that it can process them rather than escape them to literal text.  This will also let VE support them in the future and protect them from being modified right now.

Is there any other way to discover that <translate>, <tvar>, <language> are valid tags on the wiki, i.e. what is the mediawiki api endpoint?  So far we've been using siprop and fetching installed extension tags and use that to detect these.
Comment 6 ssastry 2013-08-01 17:34:49 UTC
*** Bug 52408 has been marked as a duplicate of this bug. ***
Comment 7 Ed Sanders 2013-08-01 18:02:12 UTC
@Niklas are there issues with registering them?
Comment 8 Niklas Laxström 2013-08-01 18:06:03 UTC
I don't know. Either the registration would just be symbolic without any side effects, or it could also interfere with the current functionality.
Comment 9 Niklas Laxström 2013-08-01 19:28:19 UTC
Does someone want to try registering them and see what happens? I probably don't have time before Wikimania.
Comment 10 ssastry 2013-08-22 23:33:16 UTC
Ping.
Comment 11 Gerrit Notification Bot 2013-09-09 18:20:54 UTC
Change 83427 had a related patch set uploaded by Nikerabbit:
Register translate and tvar to the parser

https://gerrit.wikimedia.org/r/83427
Comment 12 Niklas Laxström 2013-09-09 18:24:06 UTC
See the above commit. It is not thoroughly tested. I still don't understand what would the benefit.
Comment 13 ssastry 2013-09-09 19:20:13 UTC
Niklas: If VE has to support editing for <translate>, <tvar>, etc. tags, Parsoid has to know about those tags. Parsoid finds out about installed tags by querying mediawiki API. So, either MW API should expose this information for hooks automatically or we need a different API endpoint, or you have to register them.
Comment 14 Niklas Laxström 2013-09-10 00:19:36 UTC
In the long term yes, but wont this render VE useless on translatable pages in the short them?
Comment 15 James Forrester 2013-09-10 00:33:18 UTC
(In reply to comment #14)
> In the long term yes, but wont this render VE useless on translatable pages
> in the short them?

Yes. On MW.org you can use the alien node editor to edit any extension node's contents (but they won't be rich), but on other wikis it will make them un-editable in VE.

However, making a Translation extension editor in VE shouldn't be hard at all - I've created that as bug 53974 along the lines of other bugs.
Comment 16 Quim Gil 2013-09-27 23:21:38 UTC
fwiw this bug has been mentioned at https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#VisualEditor_weekly_update_-_2013-09-26_.28MW_1.22wmf19.29

I have also seen this problem here and there while editing at mediawiki.org. Then again these tags are not used in most Wikimedia projects or MediaWikis there so fair enough.
Comment 17 Siebrand Mazeland 2013-10-04 07:13:58 UTC
Assigning to James, hoping this may get a parsoid person to review the patch that's been available for nearly a month now.
Comment 18 James Forrester 2013-10-04 16:13:53 UTC
(In reply to comment #17)
> Assigning to James, hoping this may get a parsoid person to review the patch
> that's been available for nearly a month now.

Oh. We've been patiently waiting for you to merge it. :-(

Will ping the team.
Comment 19 Elitre 2013-10-05 11:25:12 UTC
I think I already did it yesterday.
Comment 20 Nemo 2013-10-05 11:49:04 UTC
(In reply to comment #19)
> I think I already did it yesterday.

There is a comment, but now a followup question requires addresssing
Comment 21 Gerrit Notification Bot 2013-10-05 12:37:28 UTC
Change 83427 merged by jenkins-bot:
Register translate and tvar to the parser

https://gerrit.wikimedia.org/r/83427
Comment 22 James Forrester 2013-10-07 15:51:27 UTC
I don't deserve the credit for Niklas's fix. :-)
Comment 23 Gabriel Wicke 2013-10-07 19:14:28 UTC
That patch won't work well with Parsoid, see the comment in the patch set.
Comment 24 Siebrand Mazeland 2013-10-09 08:43:16 UTC
This issue appears to suffer from scope creep. The requirements of the current summary have been met. See https://translatewiki.net/wiki/Special:Version?uselang=en: Section "Parser extension tags" now shows "<translate> and <tvar>"
Comment 25 Niklas Laxström 2013-10-09 09:39:41 UTC
Misunderstanding or miscommunication about the actual requirements is not scope creep. The patch which was made does not add value. I changed the title to reflect the issue with my understanding of it.
Comment 26 ssastry 2013-10-09 15:14:09 UTC
The patch will work with Parsoid right now, because Parsoid always goes through the PHP parser when dealing with extensions and tag hooks. But, there are plans for Parsoid to bypass the PHP parser and call the extensions/hooks directly.

Right now, The translate code preprocesses the page by stripping the <translate> and <tvar> tags before the PHP parser actually parses the page source (by registering for ParserBeforeStrip hook). So, even though the tag hooks are registered (with a callback that bombs if called), the PHP parser never gets to calling them.

If Parsoid is able to mimic the behavior (by calling all hooks, not just extension callback hooks), the existing code will continue to work. However, if Parsoid only supports tag extensions directly, then, the ParserBeforeStrip code won't be invoked by Parsoid at that time and this will be a problem at that point.

So, this is not an issue *right now*, but could be at a later point depending on what functionality Parsoid will implement natively and what it will continue to defer to the PHP parser. 

I think Gabriel was responding to that future concern and also hoping that we can use the opportunity of Parsoid's ongoing development to cleanup some of these interfaces and mechanisms. We could streamline extensions to go through narrow interfaces rather than continuing to use all sort of hooks into various points of the parsing timeline. Anything this is something we could discuss more.

Hope this summarizes where we stand now.
Comment 27 ssastry 2013-10-09 15:26:34 UTC
Actually, now that I wrote that, I might have misspoken about whether it will work now as well. Niklas is right. For <translate> and <tvar> tags found in top-level pages, since we never go through PHP parser at all, this may still not work. 

Parsoid will recognize the hooks because of the registration. In turn, it will call the PHP parser to translate the extension content, which in turn may callback into the translate extension rather than the ParserBeforeStrip callback (which is what <translate> and <tvar> relies on), and that will bomb.

One of us (Gabriel or me) will experiment with the updated code locally and update this bug.
Comment 28 ssastry 2013-10-09 15:57:02 UTC
Okay, we tested and all is good for now. You can ignore #c27 :-). 

As I indicated in #c26, this is a problem for when we start updating Parsoid code to call extensions directly rather than go through the full parse pipeline (which was meant to be a temporary hack while we figured out how to call extensions directly).
Comment 29 Gabriel Wicke 2013-10-09 16:01:58 UTC
To summarize:

* We currently work around the missing tag hook API by calling the full action=parse pipeline for each extension tag. This is quite expensive as you can imagine.
* This lets the translate hack work for now. At least it won't crash.
* The translate extension will crash once we actually call the registered tag hook.

So there is some time to fix this up, but we should not pretend that it is fixed right now.

Can you provide more information on what stops you from using the actual tag hook?
Comment 30 Niklas Laxström 2013-10-17 08:07:39 UTC
I just found out that if edit is made via API, the ParserBeforeStrip hook I am using does not get called and the exception is shown. Does anybody have an idea why?
Comment 31 ssastry 2013-10-17 15:13:29 UTC
Niklas: To me, it looks like an existing bug got exposed. Can you provide more details? Maybe full url of the API call? And/or, if you hop onto #mediawiki-parsoid, we could investigate.
Comment 32 ssastry 2013-10-17 16:11:00 UTC
Niklas: So, here is what I found:

https://meta.wikimedia.org/w/api.php?action=parse&text=<translate>foo</translate> works and does not crash.

https://meta.wikimedia.org/w/api.php?action=parse&text=<translate><!--T:1-->foo</translate> throws the exception.

I am a bit baffled why the comment in the extension content should change what hook is called. I'll let you investigate from here. It might be worth using this opportunity to use a regular extension hook (rather than the ParserBeforeStrip hook), if that might work for you.
Comment 33 Giovanni 2013-10-23 10:50:22 UTC
bump
Comment 34 Niklas Laxström 2013-10-24 08:14:28 UTC
Trying the parser hook solution now. I have code like:

		$parser->setHook( 'translate', function ( $input, $params, $parser, $frame ) {
			$re = '~<tvar\|([^>]+)>(.*?)</>~u';
			$output = preg_replace( $re, '\2', $input );

			$output = $parser->recursiveTagParse( $output, $frame );
			$output = trim( $output );
			return $output;
		} );

This seems to work okay on simple text, with the exception of section edit links.

I'm still trying with more complex pages with templates as well extensions and our tutorial.
Comment 35 Gerrit Notification Bot 2013-10-24 09:56:16 UTC
Change 91583 had a related patch set uploaded by Nikerabbit:
Add $wgTranslatePageTranslationUseParserHook

https://gerrit.wikimedia.org/r/91583
Comment 36 Niklas Laxström 2013-10-24 10:00:43 UTC
Issues I have found:

* [MAJOR] Disabling edit section links does not work for headings inside <translate> tags
* [NORMAL] Edit section links don't work. Not a regression, but would be nice fix. Above issue make this a bigger issue.
* [BLOCKER] Table of contents does not include links
* [MINOR] White space trimming behavior has changed. The original logic cannot be implemented with information provided for the parser hook. You can see the buildup at bottom of http://dev.translatewiki.net/wiki/Extension:Translate compare with http://www.mediawiki.org/wiki/Help:Extension:Translate . This is annoying and would be nice to fix, but the workaround is to edit page source. Documentation about whitespace before has to be update if this way is chosen.

Please help me solve the remaining issues.
Comment 37 Niklas Laxström 2013-10-24 10:04:21 UTC
Also, please test the patch if you can to see if you find any other changes in behavior.
Comment 38 Niklas Laxström 2013-10-24 10:07:52 UTC
Editing translatable pages via API seems to work now.
Comment 39 Gabriel Wicke 2013-10-25 23:03:13 UTC
(In reply to comment #36)
> Issues I have found:
> 
> * [MAJOR] Disabling edit section links does not work for headings inside
> <translate> tags

You could try marking your output as HTML according to https://www.mediawiki.org/wiki/Manual:Tag_extensions#How_can_I_avoid_modification_of_my_extension.27s_HTML_output.3F

I'm not sure whether that suppresses section edit links too, but worth a try IMO.

> * [NORMAL] Edit section links don't work. Not a regression, but would be nice
> fix. Above issue make this a bigger issue.
> * [BLOCKER] Table of contents does not include links

Can you check whether your headings are matched by the formatHeadings regexp?

Ideally we'd move both TOC and section edit links to JS / CSS. That is our plan for Parsoid output, but not going to happen over night.
Comment 40 Niklas Laxström 2013-10-26 10:01:52 UTC
(In reply to comment #39)
> (In reply to comment #36)
> > Issues I have found:
> > 
> > * [MAJOR] Disabling edit section links does not work for headings inside
> > <translate> tags
> 
> You could try marking your output as HTML according to
> https://www.mediawiki.org/wiki/Manual:
> Tag_extensions#How_can_I_avoid_modification_of_my_extension.27s_HTML_output.
> 3F

This causes (lack of) whitespace issues and breaks all lists.

> I'm not sure whether that suppresses section edit links too, but worth a try
> IMO.

It doesn't.

> 
> > * [NORMAL] Edit section links don't work. Not a regression, but would be nice
> > fix. Above issue make this a bigger issue.
> > * [BLOCKER] Table of contents does not include links
> 
> Can you check whether your headings are matched by the formatHeadings regexp?

I'm assuming it processes the html I'm returning, which is:
<h2><span class="mw-headline" id="Culture">Culture</span><mw:editsection page="Pupu" section="1">Culture</mw:editsection></h2>

Not sure whether that matches or not.

> 
> Ideally we'd move both TOC and section edit links to JS / CSS. That is our
> plan
> for Parsoid output, but not going to happen over night.

Any changes for interim solutions?
Comment 41 Gerrit Notification Bot 2014-04-16 15:51:17 UTC
Change 91583 abandoned by Siebrand:
Add $wgTranslatePageTranslationUseParserHook

Reason:
Abandoned as there is no progress on this patch set. Can be re-activated if there is outlook on getting this in a mergable state.

https://gerrit.wikimedia.org/r/91583
Comment 42 James Forrester 2014-05-10 09:58:28 UTC
No patch any more. :-(
Comment 43 Nemo 2014-05-11 09:55:37 UTC
(In reply to James Forrester from comment #42)
> No patch any more. :-(

Hmm I'm not sure that's what "Can be re-activated if there is outlook on getting this in a mergable state" means, PATCH_TO_REVIEW seems a sensible way to tag such stale patches in need of TLC.
Comment 44 James Forrester 2014-05-11 09:56:50 UTC
(In reply to Nemo from comment #43)
> (In reply to James Forrester from comment #42)
> > No patch any more. :-(
> 
> Hmm I'm not sure that's what "Can be re-activated if there is outlook on
> getting this in a mergable state" means, PATCH_TO_REVIEW seems a sensible
> way to tag such stale patches in need of TLC.

No. There is no patch to review for merge. There is patch to re-*do*.
Comment 45 Niklas Laxström 2014-10-06 06:38:46 UTC
To be clear: there is nothing wrong with that patch itself, it just doesn't work due to limitations in the parser itself. I cannot continue with this patch/bug until I get help from other people to modify parser/parsoid.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links