Last modified: 2013-07-03 13:39:17 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T49125, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 47125 - Improve performance of dispatchChanges::getPendingChanges
Improve performance of dispatchChanges::getPendingChanges
Status: VERIFIED FIXED
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
unspecified
All All
: Unprioritized normal (vote)
: ---
Assigned To: Wikidata bugs
: performance
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-04-11 14:38 UTC by Daniel Kinzler
Modified: 2013-07-03 13:39 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Daniel Kinzler 2013-04-11 14:38:42 UTC
dispatchChanges has performance issues. One major bottle neck is the getPendingChanges() function. It works be loading a block of changes, then for the item of each change loads the sitelinks, then check whether the target wiki is mentioned in the sitelinks. This means one extra database query for each change (per default, 1000 per batch). This is far too slow.

One solution would be to join the wb_changes table against the wb_items_per_site table directly. This however would no longer work when we have client side usage tracking. Also, wb_changes uses a single field for the prefixed ID of the entity, while wb_items_per_site uses one field for the entity type and one for the numeric ID. This makes joining inefficient and inconvenient.

An alternative solution would be to provide a storage layer service for 
a) checking for a given client wiki which items from a given list are used there.
b) provides all pages on a given client wiki that use one of a list of items.
Using the first method, we could filter a given block of changes using a single query.
Comment 1 Daniel Kinzler 2013-04-15 20:04:55 UTC
My original description of the problem is incorrect so far as the filterChanges() function used by getPendingChanges() does already only query the sitelinks table once, not for each change.

However, it remains true that getPendingChanges() is a bottle neck. Possible improvements include caching and optimized code flow.
Comment 2 Gerrit Notification Bot 2013-04-15 20:07:05 UTC
Related URL: https://gerrit.wikimedia.org/r/59188 (Gerrit Change Idc7def15a5bd113b2cf38f8140f26098848bc1a7)
Comment 3 Gerrit Notification Bot 2013-04-17 20:18:59 UTC
https://gerrit.wikimedia.org/r/59188 (Gerrit Change Idc7def15a5bd113b2cf38f8140f26098848bc1a7) | change APPROVED and MERGED [by Aude]
Comment 4 Gerrit Notification Bot 2013-04-23 08:54:30 UTC
https://gerrit.wikimedia.org/r/59388 (Gerrit Change I677d5fe46fcd7cf565443aa581f69e73c28fa940) | change APPROVED and MERGED [by Aude]

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links