Last modified: 2014-10-10 16:32:53 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T71760, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 69760 - backend-fail-internal error while deleting files
backend-fail-internal error while deleting files
Status: REOPENED
Product: Wikimedia
Classification: Unclassified
Media storage (Other open bugs)
wmf-deployment
All All
: High major with 4 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
aklapper-moreinfo
:
: 69717 69875 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-08-19 20:54 UTC by Rainer Rillke @commons.wikimedia
Modified: 2014-10-10 16:32 UTC (History)
23 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Rainer Rillke @commons.wikimedia 2014-08-19 20:54:57 UTC
error code: backend-fail-internal
error info: An unknown error occurred in storage backend "local-swift-eqiad"

Reported at https://commons.wikimedia.org/wiki/Commons:Administrators%27_noticeboard#Serious_deletion_error_issue
Comment 1 INeverCry 2014-08-20 02:19:46 UTC
This bug is causing about 1/3 of my attempts to delete files to fail. I then have to refresh my browser before I can finally get the files to delete, especially in mass DRs or nukes.  

INeverCry
Comment 2 Robin Krahl 2014-08-20 10:17:29 UTC
The same error occurs during file deletions on the German Wikipedia, see [0].  Error message:
  Fehler bei Datei-Löschung: Im Speicher-Backend „local-swift-eqiad“ ist ein 
  unbekannter Fehler aufgetreten.

[0]  <https://de.wikipedia.org/wiki/Wikipedia:Administratoren/Anfragen#Probleme_beim_L.C3.B6schen_von_Dateien>
Comment 4 Denniss 2014-08-20 10:33:59 UTC
This is actuall an urgent issue, it also affects uploads where images or file description pages get corrupted.
Is nobody of the tech team alerted by (hopefully existing) automatic error messages ?
Comment 5 Steinsplitter 2014-08-20 10:37:20 UTC
There are unresolved prio bugs in the "Media storage" component. Swift is a vital component of the projects' ability to show images and other media, and it having so many open bugs causes serious ongoing issues, not only on Commons, but everywhere.
Comment 6 Yellowcard 2014-08-20 10:47:45 UTC
See Screenshot in German Wikipedia: https://de.wikipedia.org/wiki/Datei:Screenshot_Fehler_im_Speicher-Backend.png

This is an urgent issue.
Comment 7 Jesús Martínez Novo (Ciencia Al Poder) 2014-08-20 11:25:34 UTC
*** Bug 69717 has been marked as a duplicate of this bug. ***
Comment 9 Andre Klapper 2014-08-20 13:50:15 UTC
<godog> it is running a bit hot on bandwidth from/to the upload caches but shouldn't be too bad, not sure exactly what mw does when talking to swift
<godog> all that load comes artificially from ms-be1003 having xfs in a funny state
 !log reboot ms-be1003, xfs errors/panics
Comment 10 Filippo Giunchedi 2014-08-20 14:24:27 UTC
that (rebooting ms-be1003) did it, the proxy mentioned ERRORS and timeouts towards ms-be1003 while attempting to DELETE, which would explain the symptoms.

can you try again and see if it works? thanks!
Comment 11 INeverCry 2014-08-20 17:45:59 UTC
Still getting a bunch of these same errors as I try deletions here on Commons:

API request failed (backend-fail-internal): An unknown error occurred in storage backend "local-swift-eqiad". <i>at Wed, 20 Aug 2014 17:42:39 GMT</i> <u>served by mw1119</u>
Comment 12 Denniss 2014-08-20 20:45:08 UTC
Observed the same at Commons, no improvement seen.
Comment 13 Steinsplitter 2014-08-20 21:27:24 UTC
API request failed (backend-fail-internal): An unknown error occurred in storage backend "local-swift-eqiad". <i>at Wed, 20 Aug 2014 21:26:31 GMT</i> <u>served by mw1132</u>
Comment 14 INeverCry 2014-08-21 03:34:00 UTC
I just deleted 200+ files from Commons with no errors.
Comment 15 Fastily 2014-08-21 06:23:08 UTC
Not sure if this is related, but uploads have been failing with a similar message:

{"error":{"0":["backend-fail-internal","local-swift-eqiad"],"code":"internal-error","info":"An internal error occurred"},"servedby":"mw1202"}
Comment 16 jeremyb 2014-08-21 06:34:08 UTC
The problem in comment 9 is clearly visible in ganglia. Don't see any obvious more recent issues on the same ganglia graphs.

https://ganglia.wikimedia.org/latest/?r=day&cs=&ce=&m=cpu_report&s=by+name&c=Swift+eqiad&h=&host_regex=&max_graphs=0&tab=m&vn=&hide-hf=false&sh=1&z=small&hc=4 (may need to adjust time period at the top depending on when you click the link)
Comment 17 jeremyb 2014-08-21 06:38:16 UTC
(In reply to Fastily from comment #15)
> Not sure if this is related, but uploads have been failing with a similar
> message:

btw, please provide timestamps for when the errors happened if you have them! (e.g. comments 13/15)
Comment 18 Filippo Giunchedi 2014-08-21 13:12:53 UTC
same here, I can't see any obvious issues with swift after rebooting the machine that was causing the high load yesterday.

we are doing some tuning to the nagios alerts we get for swift to detect reoccurence (and a root cause/fix too!)
Comment 19 Pierre-Selim 2014-08-21 13:59:43 UTC
2014-08-21T13:57Z the bug strikes back!

API request failed (backend-fail-delete): Could not delete file "mwstore://local-swift-eqiad/local-public/c/ce/Крушение_поезда_в_московском_метро_15.07.2014.jpg"
Comment 20 Nick Birse 2014-08-21 14:05:28 UTC
(In reply to Pierre-Selim from comment #19)
> 2014-08-21T13:57Z the bug strikes back!
> 
> API request failed (backend-fail-delete): Could not delete file
> "mwstore://local-swift-eqiad/local-public/c/ce/
> Крушение_поезда_в_московском_метро_15.07.2014.jpg"

Same file, slightly different error message. 

Error deleting file: Could not delete file "mwstore://local-swift-eqiad/local-public/c/ce/Крушение_поезда_в_московском_метро_15.07.2014.jpg".
Comment 21 Filippo Giunchedi 2014-08-21 20:54:55 UTC
there were further errors found with swift talking to memcached, I've pushed https://gerrit.wikimedia.org/r/#/c/155629/ to bump that limit, the timeouts are now greatly reduced, not completely eliminated yet though but the impact should be a lot less
Comment 22 Andre Klapper 2014-08-21 21:45:56 UTC
*** Bug 69875 has been marked as a duplicate of this bug. ***
Comment 23 Fastily 2014-08-22 06:33:58 UTC
(In reply to jeremyb from comment #17)
> (In reply to Fastily from comment #15)
> > Not sure if this is related, but uploads have been failing with a similar
> > message:
> 
> btw, please provide timestamps for when the errors happened if you have
> them! (e.g. comments 13/15)

Unfortunately I don't have an exact timestamp, but I do know this was happening during the same time deletions were failing.  I haven't tried uploading anything since.  Will definitely try again sometime this weekend.
Comment 24 Fastily 2014-08-23 22:40:39 UTC
So I've done quite a number of uploads and deletions since I lasted posted here, and have not experienced a 'backend-fail-internal' error since.  I'm going to go ahead and close this as resolved for now.  If anyone else is still experiencing errors, please don't hesitate to reopen! :)
Comment 25 Dereckson 2014-09-01 14:45:52 UTC
Issue reappeared on [[commons:File:Pheliperodrigues.jpg]]

Error deleting file: Could not delete file "mwstore://local-swift-eqiad/local-public/9/97/Pheliperodrigues.jpg".
Comment 26 Filippo Giunchedi 2014-09-02 07:23:14 UTC
misc data points: I'm seeing some attempts in filebackend-ops.log:

2014-09-01 13:42:52 mw1210 commonswiki: MoveFileOp failed (batch #750loigffakv97vzttctb06d3xb1nf6): {"src":"mwstore://local-swift-eqiad/local-public/9/97/Pheliperodrigues.jpg","dst":"mwstore://local-swift-eqiad/local-deleted/q/5/q/q5qea4gleglvwbotppyd5fq5jnye5zz.jpg","overwriteSame":true,"dstExists":false,"failedAction":"attempt"}
2014-09-01 13:43:20 mw1198 commonswiki: MoveFileOp failed (batch #750loighcfplahx48bnr125t45twh4z): {"src":"mwstore://local-swift-eqiad/local-public/9/97/Pheliperodrigues.jpg","dst":"mwstore://local-swift-eqiad/local-deleted/q/5/q/q5qea4gleglvwbotppyd5fq5jnye5zz.jpg","overwriteSame":true,"dstExists":true,"failedAction":"attempt"}
2014-09-01 13:45:49 mw1104 commonswiki: MoveFileOp failed (batch #750loignpnz38ysz6rjotgwq7h5i1os): {"src":"mwstore://local-swift-eqiad/local-public/9/97/Pheliperodrigues.jpg","dst":"mwstore://local-swift-eqiad/local-deleted/q/5/q/q5qea4gleglvwbotppyd5fq5jnye5zz.jpg","overwriteSame":true,"dstExists":true,"failedAction":"attempt"}
2014-09-01 13:45:50 mw1119 commonswiki: MoveFileOp failed (batch #750loignq7eycehvi3b13ykg1ji76su): {"src":"mwstore://local-swift-eqiad/local-public/9/97/Pheliperodrigues.jpg","dst":"mwstore://local-swift-eqiad/local-deleted/q/5/q/q5qea4gleglvwbotppyd5fq5jnye5zz.jpg","overwriteSame":true,"dstExists":true,"failedAction":"attempt"}
2014-09-01 13:46:10 mw1187 commonswiki: MoveFileOp failed (batch #750loigpdo6255p9lg4z3w3826strm2): {"src":"mwstore://local-swift-eqiad/local-public/9/97/Pheliperodrigues.jpg","dst":"mwstore://local-swift-eqiad/local-deleted/q/5/q/q5qea4gleglvwbotppyd5fq5jnye5zz.jpg","overwriteSame":true,"dstExists":true,"failedAction":"attempt"}
2014-09-01 13:47:00 mw1150 commonswiki: MoveFileOp failed (batch #750loigrwlc8cwz29zgmbns0fe5df9l): {"src":"mwstore://local-swift-eqiad/local-public/9/97/Pheliperodrigues.jpg","dst":"mwstore://local-swift-eqiad/local-deleted/q/5/q/q5qea4gleglvwbotppyd5fq5jnye5zz.jpg","overwriteSame":true,"dstExists":true,"failedAction":"attempt"}
2014-09-01 13:48:07 mw1175 commonswiki: MoveFileOp failed (batch #750loiguvhuma8fk8395oakjvgzq66o): {"src":"mwstore://local-swift-eqiad/local-public/9/97/Pheliperodrigues.jpg","dst":"mwstore://local-swift-eqiad/local-deleted/q/5/q/q5qea4gleglvwbotppyd5fq5jnye5zz.jpg","overwriteSame":true,"dstExists":true,"failedAction":"attempt"}
2014-09-01 13:52:58 mw1183 commonswiki: MoveFileOp failed (batch #750loih7eo5eju3z0i7mm7gdued1lxp): {"src":"mwstore://local-swift-eqiad/local-public/9/97/Pheliperodrigues.jpg","dst":"mwstore://local-swift-eqiad/local-deleted/q/5/q/q5qea4gleglvwbotppyd5fq5jnye5zz.jpg","overwriteSame":true,"dstExists":true,"failedAction":"attempt"}
2014-09-01 13:53:04 mw1073 commonswiki: MoveFileOp failed (batch #750loih8ndhfheqxmjb5qpvokp2p452): {"src":"mwstore://local-swift-eqiad/local-public/9/97/Pheliperodrigues.jpg","dst":"mwstore://local-swift-eqiad/local-deleted/q/5/q/q5qea4gleglvwbotppyd5fq5jnye5zz.jpg","overwriteSame":true,"dstExists":true,"failedAction":"attempt"}

and the hashed file seems to be already there:

# swift list wikipedia-commons-local-deleted.q5 | grep q/5/q/q5qea4gleglvwbotppyd5fq5jnye5zz.jpg
q/5/q/q5qea4gleglvwbotppyd5fq5jnye5zz.jpg
Comment 27 Filippo Giunchedi 2014-09-02 07:31:07 UTC
though no match for that file in swift-backend.log:

$ zgrep -i Pheliperodrigues.jpg swift-backend.log archive/swift-backend.log-20140901.gz archive/swift-backend.log-201408*
$

seemingly a different (but related?) issue
Comment 28 Pierre-Selim 2014-09-02 07:44:51 UTC
Looks like INeverCry finally succeed in deleting that file.
Comment 29 Andre Klapper 2014-09-22 10:49:13 UTC
Is the problem described in comment 25 to comment 27 still seen?
Comment 30 Andre Klapper 2014-10-10 16:32:53 UTC
Is the problem described in comment 25 to comment 27 still seen?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links