Last modified: 2013-10-07 18:58:15 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T52431, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 50431 - TemplateData: Process language fallback and conversion server-side
TemplateData: Process language fallback and conversion server-side
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
TemplateData (Other open bugs)
unspecified
All All
: High enhancement (vote)
: ---
Assigned To: Krinkle
: i18n
Depends on:
Blocks: ve-nonenglish 50888 52922
  Show dependency treegraph
 
Reported: 2013-06-29 19:41 UTC by Helder
Modified: 2013-10-07 18:58 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Helder 2013-06-29 19:41:57 UTC
After I added the <templatedata> to [[pt:Template:Referências]], I opened a page in which it is used, setting "uselang=pt" and "uselang=pt-br" in the URL:
1) https://pt.wikipedia.org/wiki/Arte?veaction=edit&uselang=pt
2) https://pt.wikipedia.org/wiki/Arte?veaction=edit&uselang=pt-br
In the first case, when I opened the Transclusion dialog (by clicking in the references section, where the template is used) the template description was shown (as expected). Typing one of the template parameters, I also get its label normally.

On the other hand, for the second link the user language (pt-br) was different from the content language (pt) and the user see no description and no labels.
Comment 1 Helder 2013-06-29 19:46:56 UTC
For the record, the data returned by the API has all the information, but it is in the "pt" properties:
https://pt.wikipedia.org/w/api.php?format=jsonfm&action=templatedata&titles=Predefini%C3%A7%C3%A3o%3Arefer%C3%AAncias
E.g.:
{
    "pages": {
        "1467239": {
            "title": "Predefini\u00e7\u00e3o:Refer\u00eancias",
            "description": {
                "pt": "Produz o t\u00edtulo da se\u00e7\u00e3o de refer\u00eancias e impede que seja edit\u00e1vel"
            },
            "params": {
                "t\u00edtulo": {
                    "label": {
                        "pt": "T\u00edtulo da se\u00e7\u00e3o"
                    },
                    ...
            }
            ...
     }
}
Comment 2 James Forrester 2013-06-29 19:50:04 UTC
API call should ideally specify the fall-back language chain and return just one set for clients (so the weight is on the server).
Comment 3 Liangent 2013-07-18 12:31:39 UTC
Falling back to (LanguageConverter)-converted labels is needed too.

It seems my LanguageFallbackChain and related classes (currently in Wikibase) is useful here again.
Comment 4 Liangent 2013-07-26 16:33:25 UTC
(In reply to comment #2)
> API call should ideally specify the fall-back language chain and return just
> one set for clients (so the weight is on the server).

Do we care about the real language info of a label.

That is, is this fine?

{
    "pages": {
        "1467239": {
            "title": "Predefini\u00e7\u00e3o:Refer\u00eancias",
            "description": {
                "pt": "Some text written in English"
            },
            "params": {
                "t\u00edtulo": {
                    "label": {
                        "pt": "Some other text written in Spanish"
                    },
                    ...
            }
            ...
     }
}
Comment 5 James Forrester 2013-07-26 16:36:27 UTC
(In reply to comment #4)
> (In reply to comment #2)
> > API call should ideally specify the fall-back language chain and return just
> > one set for clients (so the weight is on the server).
> 
> Do we care about the real language info of a label.

We don't, but users on multi-lingual wikis will if we send them 1 MiB of descriptions of a template when they only care about 2 KiB worth of the contents. :-)

> That is, is this fine?
> 
> {
>     "pages": {
>         "1467239": {
>             "title": "Predefini\u00e7\u00e3o:Refer\u00eancias",
>             "description": {
>                 "pt": "Some text written in English"
>             },
>             "params": {
>                 "t\u00edtulo": {
>                     "label": {
>                         "pt": "Some other text written in Spanish"
>                     },
>                     ...
>             }
>             ...
>      }
> }

Fine technically - I think users would be upset and confused.
Comment 6 Tyler Romeo 2013-07-26 16:39:22 UTC
So I'm not too Wikidata-savvy, but it seems this patch might be useful (if it's not, ignore me and carry on):

https://gerrit.wikimedia.org/r/72867

Right now there is no way to determine the real language of a message from the CDB cache. That patch changes this.
Comment 7 Liangent 2013-07-26 16:41:06 UTC
(In reply to comment #5)
> Fine technically - I think users would be upset and confused.

So you'll need to prepare for some interface design to tell users that "Some text written in English" and "Some other text written in Spanish" lines are not in pt.

And the format of JSON needs to be modified to include this info. Like:

{
    "pages": {
        "1467239": {
            "title": "Predefini\u00e7\u00e3o:Refer\u00eancias",
            "description": {
                "pt": { "value": "Some text written in English", "language": "en" }
            },
            "params": {
                "t\u00edtulo": {
                    "label": {
                        "pt": { "value": "Some other text written in Spanish", "language": "es" }
                    },
                    ...
            }
            ...
     }
}
Comment 8 Liangent 2013-07-26 16:42:21 UTC
(In reply to comment #6)
> So I'm not too Wikidata-savvy, but it seems this patch might be useful (if
> it's
> not, ignore me and carry on):
> 
> https://gerrit.wikimedia.org/r/72867
> 
> Right now there is no way to determine the real language of a message from
> the
> CDB cache. That patch changes this.

Not really. Labels and descriptions in TemplateData are stored in some JSON blob in a customized format, rather than normal messages.
Comment 9 James Forrester 2013-07-26 16:55:40 UTC
(In reply to comment #7)
> (In reply to comment #5)
> > Fine technically - I think users would be upset and confused.
> 
> So you'll need to prepare for some interface design to tell users that "Some
> text written in English" and "Some other text written in Spanish" lines are
> not in pt.

Why couldn't the TemplateData just be written in the users's language?

> And the format of JSON needs to be modified to include this info.

I don't think that's a good outcome. If there isn't a description in your language (in this case, pt), we shouldn't magically tell you that we've given you a message in a different language (we don't do this for the MW messages framework, for instance).
Comment 10 Liangent 2013-07-26 17:00:04 UTC
(In reply to comment #9)
> (In reply to comment #7)
> > (In reply to comment #5)
> > > Fine technically - I think users would be upset and confused.
> > 
> > So you'll need to prepare for some interface design to tell users that "Some
> > text written in English" and "Some other text written in Spanish" lines are
> > not in pt.
> 
> Why couldn't the TemplateData just be written in the users's language?

You can't expect all templatedata blocks to have labels in all hundreds of languages which MediaWiki supports.

> > And the format of JSON needs to be modified to include this info.
> 
> I don't think that's a good outcome. If there isn't a description in your
> language (in this case, pt), we shouldn't magically tell you that we've given
> you a message in a different language (we don't do this for the MW messages
> framework, for instance).

We do so in Wikibase, if the user indicates that they can read another language -- in our implementation it checks {{#babel: }} currently but some global preferences here will be nice of course.
Comment 11 Liangent 2013-07-26 17:01:43 UTC
(In reply to comment #9)
> I don't think that's a good outcome. If there isn't a description in your
> language (in this case, pt), we shouldn't magically tell you that we've given
> you a message in a different language (we don't do this for the MW messages
> framework, for instance).

BTW this comment means WONTFIXing this whole bug.
Comment 12 James Forrester 2013-07-26 17:11:22 UTC
(In reply to comment #11)
> (In reply to comment #9)
> > I don't think that's a good outcome. If there isn't a description in your
> > language (in this case, pt), we shouldn't magically tell you that we've given
> > you a message in a different language (we don't do this for the MW messages
> > framework, for instance).
> 
> BTW this comment means WONTFIXing this whole bug.

Why?

This is just me saying that I don't think that instead of "Chien", if it doesn't exist in French we should give users "Dog -- OMG We gave you this message in English even though you asked for it in French!", which feels significant over-kill.
Comment 13 Liangent 2013-07-26 17:29:02 UTC
(In reply to comment #12)
> (In reply to comment #11)
> > (In reply to comment #9)
> > > I don't think that's a good outcome. If there isn't a description in your
> > > language (in this case, pt), we shouldn't magically tell you that we've given
> > > you a message in a different language (we don't do this for the MW messages
> > > framework, for instance).
> > 
> > BTW this comment means WONTFIXing this whole bug.
> 
> Why?
> 
> This is just me saying that I don't think that instead of "Chien", if it
> doesn't exist in French we should give users "Dog -- OMG We gave you this
> message in English even though you asked for it in French!", which feels
> significant over-kill.

Then I guess your point is that pt-br and pt are more similar, so falling back from pt to pt-br is acceptable, while fr and en are not this case. However technically pt-br and pt have the same relationship as fr and en, or we'll have to compose some language similarity table ourselves, and manage to resolve many edge cases (eg. dialects).
Comment 14 James Forrester 2013-07-26 18:14:30 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > (In reply to comment #11)
> > > (In reply to comment #9)
> > > > I don't think that's a good outcome. If there isn't a description in your
> > > > language (in this case, pt), we shouldn't magically tell you that we've given
> > > > you a message in a different language (we don't do this for the MW messages
> > > > framework, for instance).
> > > 
> > > BTW this comment means WONTFIXing this whole bug.
> > 
> > Why?
> > 
> > This is just me saying that I don't think that instead of "Chien", if it
> > doesn't exist in French we should give users "Dog -- OMG We gave you this
> > message in English even though you asked for it in French!", which feels
> > significant over-kill.
> 
> Then I guess your point is that pt-br and pt are more similar, so falling
> back from pt to pt-br is acceptable, while fr and en are not this case.

Yes.

> However technically pt-br and pt have the same relationship as fr and en,
> or we'll have to compose some language similarity table ourselves, and manage
> to resolve many edge cases (eg. dialects).

Oh. I assumed the jQuery.i18n (or one of the other JS, MW-independent tools that the Language Engineering team have built) would have this built in. Is that not the case?
Comment 15 Tyler Romeo 2013-07-26 18:15:10 UTC
(In reply to comment #8)
> (In reply to comment #6)
> > So I'm not too Wikidata-savvy, but it seems this patch might be useful (if
> > it's
> > not, ignore me and carry on):
> > 
> > https://gerrit.wikimedia.org/r/72867
> > 
> > Right now there is no way to determine the real language of a message from
> > the
> > CDB cache. That patch changes this.
> 
> Not really. Labels and descriptions in TemplateData are stored in some JSON
> blob in a customized format, rather than normal messages.

Gotcha. Carry on. Sorry I couldn't help.
Comment 16 Krinkle 2013-07-26 18:24:38 UTC
For now this is up to the client side to handle, which realistically means it won't be handled (current language > en > nothing).


For the future I intend to have the templatedata API take a parameter for language code and resolve it on the server side. For three reasons:

* On wikis where there is more than 1 language commonly used (which is the whole point of this bug and where it is relevant, since if there is only 1 language, the wiki author can just specify { "description": "Text." } without lang-codes)..., on those wikis there will be more than 1 language defined. This will result in a large blob of JSON being transferred to e.g. VisualEditor for each template which is quite a lot of data.

* Even so, it would then still require the client-side to have knowledge of all of this and process it. Which involves a lot of language data being send to the client, a lot of translations being sent to the client, and the then client having to do all the computation for it. We can solve this the same way we solved it in ResourceLoader; We'll still cache it, but fragment it by language code based on request context.
Comment 17 Krinkle 2013-07-26 18:26:38 UTC
Also, this way we can provide good values for languages that don't exactly fallback but use a language converter. Which is also something that could potentially be done client side, but I don't see that happening just yet.
Comment 18 Liangent 2013-07-27 10:29:35 UTC
(In reply to comment #17)
> Also, this way we can provide good values for languages that don't exactly
> fallback but use a language converter. Which is also something that could
> potentially be done client side, but I don't see that happening just yet.

Which isn't really doable currently I guess; or it requires delivery of huge conversion tables (for Chinese).
Comment 19 Liangent 2013-07-27 10:56:36 UTC
(In reply to comment #14)
> Oh. I assumed the jQuery.i18n (or one of the other JS, MW-independent tools
> that the Language Engineering team have built) would have this built in. Is
> that not the case?

I can't say there isn't one but I've never heard of this.

BTW I also want a similar one on server side.
Comment 20 Gerrit Notification Bot 2013-10-06 16:48:57 UTC
Change 87724 had a related patch set uploaded by Krinkle:
Implement getIntefaceTextInLanguage and use API and Parser

https://gerrit.wikimedia.org/r/87724
Comment 21 Gerrit Notification Bot 2013-10-07 18:55:01 UTC
Change 87724 merged by jenkins-bot:
Implement getInterfaceTextInLanguage and use API and Parser

https://gerrit.wikimedia.org/r/87724

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links