Last modified: 2014-02-12 23:38:07 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T43345, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 41345 - add info if langlink is stored at repository or local
add info if langlink is stored at repository or local
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
WikidataClient (Other open bugs)
unspecified
All All
: High enhancement with 1 vote (vote)
: ---
Assigned To: Wikidata bugs
https://www.wikidata.org/wiki/Wikidat...
:
Depends on: 45534
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-24 10:58 UTC by merl
Modified: 2014-02-12 23:38 UTC (History)
12 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description merl 2012-10-24 10:58:20 UTC
Add info if a langlinks is stored at repository of local to langlinks/ll on api.php?action=query&prop=langlinks&titles=...

Bots need this info, because currently bots try to search for a langlink source on local wikipages. If the cannot find its source on the main page they start searching for langlink on included pages (mostly on template namespace lankings are included from subpage). This costs many page source requests and  processing time for parsers a  bot frameworks.

But if bots would know that langlinks are already stored at wikidata they do not have to request source code of many local pages.

Example:
http://de.wikipedia.org/w/api.php?action=query&prop=langlinks&titles=Vorlage:!

currently returns
<api>
  <query>
    <pages>
      <page pageid="5327033" ns="10" title="Vorlage:!">
        <langlinks>
          <ll lang="ace" xml:space="preserve">Pola:!</ll>
          <ll lang="ar" xml:space="preserve">قالب:!</ll>
          <ll lang="as" xml:space="preserve">সাঁচ:!</ll>
        </langlinks>
      </page>
    </pages>
  </query>
</api>

maybe this can be extended to
<api>
  <query>
    <pages>
      <page pageid="5327033" ns="10" title="Vorlage:!">
        <langlinks>
          <ll lang="ace" storage="repository" xml:space="preserve">Pola:!</ll>
          <ll lang="ar" storage="local" xml:space="preserve">قالب:!</ll>
          <ll lang="as" storage="repository" xml:space="preserve">সাঁচ:!</ll>
        </langlinks>
      </page>
    </pages>
  </query>
</api>

If querying this info takes much resources an extra parameter should be added (like llurl for fullurl extra info) and info should only be shown if requested.
Comment 1 jeblad 2012-12-01 14:31:13 UTC
This bug come up in a thread on Project chat (http://www.wikidata.org/wiki/Wikidata:Project_chat#Prioritizing_Hungarian_articles) and it could be important to fix it. That is it has load issues, but will not impact us very much as it is only one bot for now.
Comment 2 Nemo 2013-01-31 09:31:03 UTC
(In reply to comment #1)
> This bug come up in a thread on Project chat
> (http://www.wikidata.org/wiki/Wikidata:
> Project_chat#Prioritizing_Hungarian_articles)
> and it could be important to fix it. That is it has load issues, but will not
> impact us very much as it is only one bot for now.

Is this bug still current? «i hope that bugzilla:41345 will be available before client extension goes live. Merlissimo (talk) 16:25, 30 November 2012 (UTC)»
Which has already happened, and bots found another way it seems?
Comment 3 merl 2013-01-31 15:53:28 UTC
This is still open. There is not real solution. Because only article namespace is imported atm bots simply expect that langlinks are on wikidata if not founded in main source. Handling langlinks from inculded subpages like on template namespace will be impossible if this bug is not resolved.
Comment 4 Yuri Astrakhan 2013-02-17 01:13:11 UTC
To solve this bug, could someone comment on who creates langlinks table entries in the client DB? I might be mistaken, but it seems that the langlinks are not pulled dynamically from the repo, but rather copied in the background or on null edits. If this is the case, we might have to modify langlinks table to include an extra column for the "source".
Comment 5 Daniel Kinzler 2013-02-21 16:15:22 UTC
re #4: Langlinks are pulled directly from the repo, but only when the page is re-rendered. When an item changes on wikidata.org, a background process (dispatchChanges.php) is used to invalidate the respective pages, so they get re-rendered. This may take a few minutes.
Comment 6 Daniel Kinzler 2013-02-21 16:17:39 UTC
re #3: I currently see no easy way to do this. There is just no place to store this info on the client, and schema changes to large tables (like adding a field to the langlink table) are only done if absolutely necessary.

We could add a separate table to track this, but that has additional implications, needs more thought and is not trivial to code either. I'm actually quite happy that we can manage without *any* changes to the client database.
Comment 7 Brad Jorsch 2013-02-22 19:53:50 UTC
(In reply to comment #4)
> To solve this bug, could someone comment on who creates langlinks table
> entries
> in the client DB? I might be mistaken, but it seems that the langlinks are
> not
> pulled dynamically from the repo, but rather copied in the background or on
> null edits.

When the page is parsed and a langlink is found, it calls addLanguageLink() on the ParserOutput object. The Wikidata client code hooks into the ParserAfterParse hook and does the same for all the additional language links it wants to add. The accumulated list of language links in the ParserOutput (eventually) gets saved to the langlinks table.

> If this is the case, we might have to modify langlinks table to
> include an extra column for the "source".

Seems that way to me. ParserOutput and whatever does the actual updating of langlinks would also have to be changed to handle the extra field.
Comment 8 Daniel Kinzler 2013-02-22 23:21:14 UTC
It just occurred to me that we could stuff the list of "local" links, without the ones from wikidata, into the page_props table. It would be serialized data, so we couldn't directly compare that to what's in the langlink table, but when asking for the langlinks for a specific page, it would be sufficient to provide the information which link comes from where.
Comment 9 Betacommand 2013-02-24 17:49:48 UTC
polluting page_props is just an ugly hack, the best idea would be adding a new column to langlinks, or not storing wikidata links in langlinks.
Comment 10 Ricordisamoa 2013-12-09 22:54:56 UTC
(In reply to comment #0)
> Add info if a langlinks is stored at repository of local to langlinks/ll on
> api.php?action=query&prop=langlinks&titles=...
> 
> Bots need this info, because currently bots try to search for a langlink
> source
> on local wikipages. If the cannot find its source on the main page they start
> searching for langlink on included pages (mostly on template namespace
> lankings
> are included from subpage). This costs many page source requests and 
> processing time for parsers a  bot frameworks.
> 
> But if bots would know that langlinks are already stored at wikidata they do
> not have to request source code of many local pages.
> 
> Example:
> http://de.wikipedia.org/w/api.php?action=query&prop=langlinks&titles=Vorlage:
> !
> 
> currently returns
> <api>
>   <query>
>     <pages>
>       <page pageid="5327033" ns="10" title="Vorlage:!">
>         <langlinks>
>           <ll lang="ace" xml:space="preserve">Pola:!</ll>
>           <ll lang="ar" xml:space="preserve">قالب:!</ll>
>           <ll lang="as" xml:space="preserve">সাঁচ:!</ll>
>         </langlinks>
>       </page>
>     </pages>
>   </query>
> </api>
> 
> maybe this can be extended to
> <api>
>   <query>
>     <pages>
>       <page pageid="5327033" ns="10" title="Vorlage:!">
>         <langlinks>
>           <ll lang="ace" storage="repository"
> xml:space="preserve">Pola:!</ll>
>           <ll lang="ar" storage="local" xml:space="preserve">قالب:!</ll>
>           <ll lang="as" storage="repository" xml:space="preserve">সাঁচ:!</ll>
>         </langlinks>
>       </page>
>     </pages>
>   </query>
> </api>
> 
> If querying this info takes much resources an extra parameter should be added
> (like llurl for fullurl extra info) and info should only be shown if
> requested.

I'd rather suggest something like:

<api>
  <query>
    <pages>
      <page pageid="5327033" ns="10" title="Vorlage:!">
        <langlinks>
          <ll lang="ace" shared="" xml:space="preserve">Pola:!</ll>
          <ll lang="ar" xml:space="preserve">قالب:!</ll>
          <ll lang="as" shared="" xml:space="preserve">সাঁচ:!</ll>
        </langlinks>
      </page>
    </pages>
  </query>
</api>

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links