Last modified: 2013-01-30 10:06:42 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T41519, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 39519 - Pathological amount of DB queries, and redundant DB queries, on some custom templates
Pathological amount of DB queries, and redundant DB queries, on some custom t...
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
Semantic MediaWiki (Other open bugs)
unspecified
All All
: Unprioritized normal (vote)
: ---
Assigned To: Nischay Nahata
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-21 03:45 UTC by Patrick Lauer
Modified: 2013-01-30 10:06 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
SQL debug log of a single page view, compressed (10.5MB raw) (193.86 KB, application/octet-stream)
2012-08-21 03:45 UTC, Patrick Lauer
Details

Description Patrick Lauer 2012-08-21 03:45:25 UTC
Created attachment 10988 [details]
SQL debug log of a single page view, compressed (10.5MB raw)

On some pages we're seeing exquisitely bad performance (page generation time over 30 seconds on a fast machine) for some views.

Debug info points at DB usage as one bottleneck: One page generates just over 42k (yes, forty two thousand) DB queries. See attachment.

Last DB query is:
Query wiki (42053) (slave): COMMIT

Looking at the queries there appears to be lots of duplication, for example:

Query wiki (84) (slave): SELECT /* SMWSQLStore2::getSMWPageIDandSort Patrick.lauer */  smw_id,smw_iw,smw_sortkey  FROM `smw_ids`  WHERE smw_title='P078272' AND smw_namespace='0' AND (smw_iw='' OR smw_iw=':smw-redi') AND smw_subobject=''  LIMIT 1  
Query wiki (85) (slave): SELECT /* SMWSQLStore2::getSMWPageIDandSort Patrick.lauer */  smw_id,smw_iw,smw_sortkey  FROM `smw_ids`  WHERE smw_title='P078272' AND smw_namespace='0' AND (smw_iw='' OR smw_iw=':smw-redi') AND smw_subobject=''  LIMIT 1  
Query wiki (86) (slave): SELECT /* SMWSQLStore2::getSMWPageIDandSort Patrick.lauer */  smw_id,smw_iw,smw_sortkey  FROM `smw_ids`  WHERE smw_title='P078272' AND smw_namespace='0' AND (smw_iw='' OR smw_iw=':smw-redi') AND smw_subobject=''  LIMIT 1  
Query wiki (87) (slave): SELECT /* SMWSQLStore2::getSMWPageIDandSort Patrick.lauer */  smw_id,smw_iw,smw_sortkey  FROM `smw_ids`  WHERE smw_title='P078272' AND smw_namespace='0' AND (smw_iw='' OR smw_iw=':smw-redi') AND smw_subobject=''  LIMIT 1  
Query wiki (88) (slave): SELECT /* SMWSQLStore2::getSMWPageIDandSort Patrick.lauer */  smw_id,smw_iw,smw_sortkey  FROM `smw_ids`  WHERE smw_title='P078272' AND smw_namespace='0' AND (smw_iw='' OR smw_iw=':smw-redi') AND smw_subobject=''  LIMIT 1  
Query wiki (89) (slave): SELECT /* SMWSQLStore2::getSMWPageIDandSort Patrick.lauer */  smw_id,smw_iw,smw_sortkey  FROM `smw_ids`  WHERE smw_title='P078272' AND smw_namespace='0' AND (smw_iw='' OR smw_iw=':smw-redi') AND smw_subobject=''  LIMIT 1  
Query wiki (90) (slave): SELECT /* SMWSQLStore2::getSMWPageIDandSort Patrick.lauer */  smw_id,smw_iw,smw_sortkey  FROM `smw_ids`  WHERE smw_title='P078272' AND smw_namespace='0' AND (smw_iw='' OR smw_iw=':smw-redi') AND smw_subobject=''  LIMIT 1  
Query wiki (91) (slave): SELECT /* SMWSQLStore2::getSMWPageIDandSort Patrick.lauer */  smw_id,smw_iw,smw_sortkey  FROM `smw_ids`  WHERE smw_title='P078272' AND smw_namespace='0' AND (smw_iw='' OR smw_iw=':smw-redi') AND smw_subobject=''  LIMIT 1  
Query wiki (92) (slave): SELECT /* SMWSQLStore2::getSMWPageIDandSort Patrick.lauer */  smw_id,smw_iw,smw_sortkey  FROM `smw_ids`  WHERE smw_title='P078271' AND smw_namespace='0' AND (smw_iw='' OR smw_iw=':smw-redi') AND smw_subobject=''  LIMIT 1  

But even if that redundancy were removed it'd still be doing too much work - the page view is just a list of all pages with certain attributes.

I can provide further info and template details if needed.
Comment 1 Markus Krötzsch 2012-08-21 07:56:32 UTC
It is known that there can be large numbers of small SQL queries in some configurations. We are working on improving the in-memory caching to significantly reduce this (it is also known that getSMWPageIDandSort is the main reason for this problem).

Please note that the queries that are run on one page view may not belong to one page only. MediaWiki has a system of background jobs that refresh other pages in small batches when you view one. This can explain queries that appear completely unrelated to the page that you view. To get reliable query logs for one particular page, you should first empty your job queue (can be processed via a command line script of MW) or (better yet) temporarily disable jobs to be run while pages are viewed.

Overall, however, I have only seen such large query logs on sites that used exceedingly complex template structures in combination with #ask or #show. Typical setups included "all you can know" templates that check for the presence of all kinds of data that they will then show. The extreme case of this design is to have one generic infobox template for all, using queries and parser functions to customise its look for various types of pages. Such a design is rarely a good idea; besides the efficiency issues, it also leads to extremely complex templates that are hard to maintain. Symptomatic for this semantic wiki anti-pattern are also auxiliary templates that are used as macros in many places, and which are using many queries themselves (e.g., a template that creates a table line for a property value, but only if the property has some value, and possibly picking different formats depending on the number of values that the property has). Such sub-templates are then often overused in places where it is already clear how many values a property should have. Another performance killer is template-based result formatting in #ask, esp. if the template itself uses queries. This particular design leverages SMW itself to create large numbers of queries and templates on one page. One can improve performance by revisiting such designs and reworking template structures to be more streamlined. I don't know if any of this applies to your site, of course.

Which version of SMW exactly are you using? Do you have special settings for $smwgDefaultStore in your LocalSettings? Please also provide a link to the page that creates the problems for you, or give a detailed description of its contents, including the templates that it uses.
Comment 2 Patrick Lauer 2012-08-21 08:21:12 UTC
>Which version of SMW exactly are you using?
1.5.6 in production, 1.7.1 for testing. Both have roughly the same performance characteristics.

>Do you have special settings for $smwgDefaultStore in your LocalSettings?
No, it's not set in the config.

The wiki is an internal system and definitely not public. There are some slow pages, the one I mentioned above contains this query at the core:

{{ #ask: [[Category:Open projects]] [[Project Manager here::{{{1}}}]]
| ?Regarding
| ?Customer
| ?Customer's project name
| ?Project name
| ?Remark
| ?Purchase order number
| ?Project Manager customer
| ?Order date#ISO
| ?Deadline#ISO
| ?Delivery date#ISO
| order=desc
| limit=2000
| mainlabel=Project
| format=template
| introtemplate=Project overview table intro
| template=Project overview table row
| outrotemplate=Project overview table outro
}}

We use SMW as a project management tool, we currently have about 11k project pages. This query shows between 10 (fast enough) and >500 projects in an overview table, the slowdown seems to be roughly linear in the number of projects.

(Sidenote: Why the bleeep does SMW cut off after 10k items without error? That is really ... rude, and has caused us lots of grief)
Comment 3 Markus Krötzsch 2012-08-21 10:26:33 UTC
It is remarkable that this works at all for big query results. Independently of the SMW SQL queries, we found it very problematic to create a page with many hundreds or thousands of MediaWiki template calls. This usually makes parsing very slow, even if the query results are found quickly. With 2000 results, you should get pages that border MW's capacity. One way to improve this is to store the template-based formatted output with the pages, instead of creating it in the query. You can use a Text type property for this. Then the query only needs to display this text rather than calling templates again. To store data from one page into a template on the same page, the Variable extension is handy.

We will be looking into reducing the overall amount of queries soon, but this is part of the SMW Storage optimisation project and there are still some other things to fix there. Do you plan to attend SMWCon in Cologne this October? We might have some more info by then.
Comment 4 Markus Krötzsch 2012-10-16 12:49:08 UTC
The original problem should be solved by SQLStore3 since commit https://gerrit.wikimedia.org/r/#/c/28057/

There is now a much more effective caching layer that largely prevents duplicate queries to the smw_ids table (the cache might still be too small for your site to prevent duplicate queries altogether, this could maybe be fixed by increagin cache size.
Comment 5 Patrick Lauer 2013-01-30 10:06:42 UTC
With SMW 1.8 + SQLStore 3 there's a minor improvement, but the pathological queries are still exquisitely slow.

Amount of queries goes down from ~40k to ~17k for the "worst" test case, and the runtime goes down from ~60 seconds to ~35 seconds. 

Good, but not good enough ;)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links