Last modified: 2012-10-17 16:27:59 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T40136, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 38136 - [SMW] Patch: SMW_refreshData.php delay time * 100 causes high server load
[SMW] Patch: SMW_refreshData.php delay time * 100 causes high server load
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
Semantic MediaWiki (Other open bugs)
unspecified
All All
: Unprioritized critical (vote)
: ---
Assigned To: Nobody - You can work on this!
: patch
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-03 04:56 UTC by badon
Modified: 2012-10-17 16:27 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Patch resolving the "bug". (1.63 KB, patch)
2012-07-03 04:56 UTC, badon
Details

Description badon 2012-07-03 04:56:24 UTC
Created attachment 10816 [details]
Patch resolving the "bug".

SMW_refreshData.php takes a delay parameter that is supposed to slow the progress of the refresh for each SMW ID. However, for some reason the delay time is multiplied by 100 and then applied only for blocks of 100 ID's. That causes 100% server load while the 100 ID's are being processed, even while idle. In most cases we tested, it caused otherwise idle servers to be sluggish to respond to even a single HTTP request.

If other refresh operations are in progress, or the MediaWiki job queue is running, the problem can compound until the server grinds to a halt, and the refresh on the block of 100 ID's never completes. Then, the server won't even respond to SSH connection attempts for a reboot command, and it must be manually power-cycled. This was tested on a variety of servers. We tested a 2.5 GHz dual-core server with 8 GB RAM before we realized it was a flaw in SMW, not the server.

I don't know why the code prevents the delay from applying to each SMW ID, and the comments do not explain. The fix was simple, and I have attached a patch. This is my first patch, so go easy on me if something isn't correct. It appears my text editor removed some superfluous white space too, but that shouldn't matter.

I tested my changes on SMW 1.7.0.2, but supplied a patch for the version of SMW_refreshData.php that is included in SMW 1.7.1. We haven't upgraded to test 1.7.1 yet, but these changes are so trivial, I doubt they would make any difference. It appears SMW_refreshData.php has no changes in SMW 1.7.1 that would produce different results from what we tested.
Comment 1 Markus Krötzsch 2012-07-06 11:40:35 UTC
Thanks for the patch. We should integrate this (I did not do this yet, but I wanted to give you a quick reply at least).

The original reason for using batches of 100 was that individual pages are usually so quick to process that a delay after each seemed unnecessary. We can change this. However, if you have problems with 100 pages, you might already have problems with 1 page (often, there are many more short/simple pages than "slow" pages). Maybe try running your update script with a larger nice value to avoid it from blocking more important processes. This only has effect if the problem is not in the database system (which applies the same priority to all queries). If your problems persist, esp. if it is due to the MySQL part of the processing, then it would be nice to know why exactly your pages need so much CPU for refreshing. We are currently looking into storage optimizations and are interested in testing their efficacy on sites that have one or the other performance issue.
Comment 2 badon 2012-07-08 01:20:22 UTC
The refresh script testing we did was all done at a nice value of 19. It only takes a few seconds at most to process each page, with the typical time being about 0.5 to 1.5 seconds. You may be correct that the problem is at the database level, but I'm not sure about that.

I don't think any one feature of SMW or SMW page is responsible. The MediaWiki job queue will have similar effects, but it provides a --maxtime option that ends the run if it takes too long. I typically use --maxtime to give 1/3 to 2/3 of each minute to the job queue (if there are jobs to run), depending on the visitor traffic load. For a casual SMW refresh, I am using a 10 second to 20 second delay after processing each ID, so the load is negligible over the 1 or 2 weeks that it runs.

Let me know if there's anything I could do better with my patch, for the next time I do one. I like commenting code in detail so it is self-documenting. Would it be alright if some of my patches are just code comments?
Comment 3 Andre Klapper 2012-09-12 15:50:02 UTC
Comment on attachment 10816 [details]
Patch resolving the "bug".

[Correcting MIME Type and setting patch flag]
Comment 4 Markus Krötzsch 2012-10-17 16:27:59 UTC
Fixed in https://gerrit.wikimedia.org/r/#/c/28370/ as suggested.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links