Last modified: 2013-12-12 16:05:31 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T60299, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 58299 - /tmp filled up on deployment-jobrunner08.pmtpa.wmflabs during testing
/tmp filled up on deployment-jobrunner08.pmtpa.wmflabs during testing
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
GWToolset (Other open bugs)
unspecified
All All
: Unprioritized normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-12-10 23:39 UTC by Bryan Davis
Modified: 2013-12-12 16:05 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Directory listing of /tmp before purge (59.62 KB, text/plain)
2013-12-10 23:41 UTC, Bryan Davis
Details
wgDebugLogFile output for a job that leaked /tmp/URLRfQIph (33.36 KB, text/plain)
2013-12-11 13:29 UTC, Antoine "hashar" Musso (WMF)
Details
wgDebugLogFile output for a job that leaked /tmp/URL1GQNyL (43.78 KB, application/octet-stream)
2013-12-12 01:15 UTC, Bryan Davis
Details

Description Bryan Davis 2013-12-10 23:39:38 UTC
We started seeing a lot of "Error writing temporary file" errors during testing this afternoon. I eventually figured out that the root disk on deployment-jobrunner08.pmtpa.wmflabs was at 100% used.

  deployment-jobrunner08:~
  bd808$ df -h
  Filesystem                                   Size  Used Avail Use% Mounted on
  /dev/vda1                                    9.9G  9.4G     0 100% /

The culprit seemed to be /tmp with ~1300 consuming 4320568 bytes.

Deleting the files in /tmp freed ~4.2G of space and the temp file errors went away.
Comment 1 Bryan Davis 2013-12-10 23:41:03 UTC
Created attachment 14051 [details]
Directory listing of /tmp before purge
Comment 2 Antoine "hashar" Musso (WMF) 2013-12-11 13:29:38 UTC
Created attachment 14057 [details]
wgDebugLogFile output for a job that leaked /tmp/URLRfQIph

From bd808 file I have looked the log for the leaked file /tmp/URLRfQIph 

The corresponding image is a 3062396 bytes jpeg : http://commons.wikimedia.beta.wmflabs.org/wiki/File:Rhododendron_lutescens_Franch.-E_-_Royal_Botanic_Garden_Living_Collection_-_19913373.jpeg


I have attached the debug log.

There are several files being processed. I am not sure what is triggering it though.
Comment 3 Gerrit Notification Bot 2013-12-11 23:41:08 UTC
Change 100928 had a related patch set uploaded by BryanDavis:
Hack: cron job to clean up files orphaned by UploadFromUrl

https://gerrit.wikimedia.org/r/100928
Comment 4 Bryan Davis 2013-12-12 00:18:55 UTC
There is a patch, but it won't fix this bug, just hopefully mask some of the worst symptoms.
Comment 5 Bryan Davis 2013-12-12 00:32:13 UTC
Dan uploaded a batch of 10169 records via GWToolset. At the end there were these files left in /tmp:

    -rw------- 1 apache apache 15014788 Dec 11 16:20 URL1GQNyL
    -rw-r--r-- 1 apache apache 15014788 Dec 11 00:50 URL4du6bo
    -rw------- 1 apache apache 12421680 Dec 11 17:50 URL5dcBXC
    -rw-r--r-- 1 apache apache 15014788 Dec 11 02:17 URL7FUmuR
    -rw------- 1 apache apache      403 Dec 11 23:17 URL7OBhEq
    -rw------- 1 apache apache      403 Dec 11 22:48 URLFy7pUH
    -rw------- 1 apache apache      403 Dec 11 21:06 URLhr7ajj
    -rw------- 1 apache apache 15014788 Dec 11 15:07 URLmpkXjt
    -rw------- 1 apache apache 12421680 Dec 11 16:23 URLq1SPzU
    -rw------- 1 apache apache      403 Dec 11 20:29 URLQbBwYs
    -rw------- 1 apache apache      403 Dec 11 23:10 URLr32Ys6
    -rw------- 1 apache apache      403 Dec 11 23:30 URLsIz8tM
    -rw------- 1 apache apache      403 Dec 11 23:38 URLWObPsW
    -rw------- 1 apache apache      403 Dec 11 22:25 URLWOzMFN
    -rw-r--r-- 1 apache apache 12421680 Dec 11 02:17 URLYnHHMG
    -rw------- 1 apache apache      403 Dec 11 23:52 URLYul1UL
    -rw------- 1 apache apache      403 Dec 11 20:43 URLzm8Zob
    -rw-r--r-- 1 apache apache 12421680 Dec 11 00:56 URLzoyegb
    -rw------- 1 apache apache      403 Dec 11 20:28 URLzpfdZ5
Comment 6 Bryan Davis 2013-12-12 01:15:37 UTC
Created attachment 14067 [details]
wgDebugLogFile output for a job that leaked /tmp/URL1GQNyL

I searched the logs for /tmp/URL1GQNyL and found that it was processed in the cli execution labeled 'commonswiki-d380fe2b'.
Comment 7 Bryan Davis 2013-12-12 01:34:12 UTC
There is a leak at https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FGWToolset/7b2cfa5edfd2cee682112e4efe661fbec14ee635/includes%2FHandlers%2FUploadHandler.php#L569

The application code should be calling $Upload->cleanupTempFile() before exiting to ensure that the temp file is destroyed. In the happy path the temp file is cleaned up by $Upload->performUpload().

An associated question to ask here is if UploadBase should call cleanupTempFile() in a destructor to stop this sort of silliness.
Comment 8 Gerrit Notification Bot 2013-12-12 01:59:57 UTC
Change 100946 had a related patch set uploaded by Dan-nl:
bug-58299

https://gerrit.wikimedia.org/r/100946
Comment 9 Gerrit Notification Bot 2013-12-12 02:03:38 UTC
Change 100946 merged by jenkins-bot:
Call UploadBase::cleanupTempFile before exiting

https://gerrit.wikimedia.org/r/100946
Comment 10 Gerrit Notification Bot 2013-12-12 02:05:55 UTC
Change 100928 abandoned by BryanDavis:
Hack: cron job to clean up files orphaned by UploadFromUrl

Reason:
Dan and I think we have found and fixed the temp file leak.

https://gerrit.wikimedia.org/r/100928
Comment 11 Bryan Davis 2013-12-12 02:10:58 UTC
Dan's going to run another large import batch against a GLAM data source that had known issues to verify the fix. At Thu Dec 12 02:10:24 UTC 2013 deployment-jobrunner08:/tmp has no URL* files in it.
Comment 12 dan 2013-12-12 11:45:38 UTC
change https://gerrit.wikimedia.org/r/100946 seems to have taken care of the /tmp folder orphans. 228 out of 275 maps were uploaded. there were no orphaned URLxxx files in the /tmp folder after the bulk upload completed. http://commons.wikimedia.beta.wmflabs.org/w/index.php?title=Category:National_Library_of_Latvia

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links