Last modified: 2013-11-19 02:03:38 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T54593, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 52593 - Increase chunked upload size limit to support longer videos
Increase chunked upload size limit to support longer videos
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
wmf-deployment
All All
: Low enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
: shell
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-07 02:40 UTC by Greg Grossmeier
Modified: 2013-11-19 02:03 UTC (History)
18 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Greg Grossmeier 2013-08-07 02:40:42 UTC
Per request from users, especially those in the GLAM community.
Comment 1 Kelson [Emmanuel Engelhart] 2013-11-01 20:16:52 UTC
Chapters organize conferences with presentations, discussions and workshops which are time to time longer than 90 minutes. In that case, it's almost impossible to upload video recordings of these events on Commons. For this reason, chapters often use commercial platforms (Youtube/Vimeo/Dailymotion/...) instead of Commons to share their videos. The most courageous ones achieve to catch someone with a shell access on Commons, but this is always really not user firendly. What are the technical reasons to limit chunked uploads to 500 MB (and not 1GB for example)?
Comment 2 Erik Moeller 2013-11-01 22:30:26 UTC
This is more of an ops question, so adding Mark, Faidon & Ken. Recap: Limit for chunked uploads (requires enabling an experimental feature in user preferences) is 500MB, without chunked uploads it's 100MB.

If we increased it from 500MB to 1GB while still keeping that feature obscure, it would probably have manageable impact. Mark/Faidon, can you give us a sense of whether this would be problematic given current storage capacity? Beyond total capacity, would an increase in the number of objects at 500MB-1GB size be a problem?
Comment 3 Faidon Liambotis 2013-11-04 12:09:32 UTC
TL;DR: increasing the limit to at least 1GiB is fine from an ops perspective.

We're currently at 63.6T out of 96T (* 3 replicas * 2 for pmtpa/eqiad = 576T raw).  Individual disks show at as much as 70% full. About 5.5T of these are temp data that haven't been pruned because of #56401 and friends, so we'll regain some capacity from there. The thumb RFC can potentially shave off as much as 15.5T of thumbs (perhaps some number in between, depending on the solution we'll end up choosing).

Even at the current trend, estimates place us at 75-80% (max of our comfort zone, to be able to provide redundancy + lead time to procure hardware) by April/May:
http://ganglia.wikimedia.org/latest/graph.php?r=year&z=xlarge&c=Swift+pmtpa&h=Swift+pmtpa+prod&jr=&js=&v=63597099281113&m=swift_bytes_count&vl=total+bytes&trend=1

There are some ideas of increasing the capacity much earlier than that by moving pmtpa hardware to eqiad at the end of the year but nothing's decided yet. I can say with certainty that we're not going to keep 6 replicas of everything with the new datacenter but use Swift 1.9's georeplication features to lower this to, likely, 4.

Varnish's caches are much smaller, obviously, but they're LRU, so unless we have tons of very popular large files, it shouldn't affect them much.

Large files aren't a big deal with Swift or its underlying filesystem (XFS) -- at least up to (the default of) 5G; after that, we'd need to explore segmented files in Swift itself ("large object support"). Large files are actually *much* more efficient to handle that really small files (filesystem overheads etc.)

Now, a large number of large files could have the potential of throwing us off planning, especially if you account for a multiplification factor because of transcoding and us keeping in Swift multiple versions of the same video file in different formats & resolution.

However, I don't think it's even remotely plausible this would happen. All of our transcoded files account for a mere 1.1T. Additionally, the 21.251.977 objects in Commons (originals, does *not* include thumbs/transcoded) are distributed in size as follows:

0 bytes  - 4.0KiB   = 368841
4.0KiB   - 8.0KiB   = 275486
8.0KiB   - 16.0KiB  = 596394
16.0KiB  - 32.0KiB  = 972185
32.0KiB  - 64.0KiB  = 1528037
64.0KiB  - 128.0KiB = 2466817
128.0KiB - 256.0KiB = 2294701
256.0KiB - 512.0KiB = 2247147
512.0KiB - 1.0MiB   = 2453605
1.0MiB   - 2.0MiB   = 2746332
2.0MiB   - 4.0MiB   = 2931704
4.0MiB   - 8.0MiB   = 1832701
8.0MiB   - 16.0MiB  = 410738
16.0MiB  - 32.0MiB  = 88009
32.0MiB  - 64.0MiB  = 24599
64.0MiB  - 128.0MiB = 13504
128.0MiB - 256.0MiB = 933
256.0MiB - 512.0MiB = 192
512.0MiB - 1.0GiB   = 52

Files over 64MiB are a mere 0.06% of the total file count and account for under 2T in size in total. Files over 128MiB are less than one tenth of files between 64MiB-128MiB. I think it's safe to assume that files in the 512MiB-1.0GiB will stay well below a 1TiB limit in the mid-term, which is more than fine given our current media storage pool.

Finally, a factor that should be considered is the resources needed from the videoscaler (TMH) infrastructure. Jan Gerber is the expert here, but I don't think going to 1GiB is going to make any big difference. Maybe silly things such as cgroup limits would need to be adjusted but it's not a pressing matter anyway as the process is asynchronous and we can course-correct as we go forward.
Comment 4 Gerrit Notification Bot 2013-11-06 00:26:07 UTC
Change 93900 had a related patch set uploaded by Eloquence:
Increase upload size limit for chunked and URL uploads to 1000MB.

https://gerrit.wikimedia.org/r/93900
Comment 5 Erik Moeller 2013-11-06 00:28:38 UTC
Adding Jan per above in case there's anything to be done from the TMH perspective.
Comment 6 Kelson [Emmanuel Engelhart] 2013-11-06 11:00:37 UTC
This is a great move. 
Thank you so much for this.
Comment 7 Gerrit Notification Bot 2013-11-07 18:52:38 UTC
Change 93900 merged by jenkins-bot:
Increase upload size limit for chunked and URL uploads to 1000MB.

https://gerrit.wikimedia.org/r/93900
Comment 8 Marco 2013-11-09 11:02:12 UTC
Change got merged. Thus marking as resolved.
Comment 9 Tilman Bayer 2013-11-11 12:02:09 UTC
This is great news. However, I just tried to upload an 814MB file ("WMF Monthly Metrics Meeting November 7, 2013.ogv") without success. After starting the upload, the following status messages appear: 

'Uploading...' --> 'Queued...' (for several minutes) --> 'Unknown error: "unknown"' 

(using UploadWizard, chunked uploads enabled, tried in Chromium and Firefox. Just noting this here for the moment, might file a separate bug later)
Comment 10 Tilman Bayer 2013-11-11 12:04:03 UTC
(In reply to comment #9)
> This is great news. However, I just tried to upload an 814MB file ("WMF
> Monthly
> Metrics Meeting November 7, 2013.ogv") without success. After starting the
> upload, the following status messages appear: 
> 
> 'Uploading...' --> 'Queued...' (for several minutes) --> 'Unknown error:
> "unknown"' 
> 
> (using UploadWizard, chunked uploads enabled, tried in Chromium and Firefox.
> Just noting this here for the moment, might file a separate bug later)

PS: Clicking 'Retry failed uploads' results in 'Unknown error: "stasherror"'.
Comment 11 Bawolff (Brian Wolff) 2013-11-11 15:18:41 UTC
(In reply to comment #9)
> This is great news. However, I just tried to upload an 814MB file ("WMF
> Monthly
> Metrics Meeting November 7, 2013.ogv") without success. After starting the
> upload, the following status messages appear: 
> 
> 'Uploading...' --> 'Queued...' (for several minutes) --> 'Unknown error:
> "unknown"' 
> 
> (using UploadWizard, chunked uploads enabled, tried in Chromium and Firefox.
> Just noting this here for the moment, might file a separate bug later)

There are various reports about stashed upload being unreliable, and that unreliability increasing with number of chunks. See bug 3658
Comment 12 Bawolff (Brian Wolff) 2013-11-11 15:48:10 UTC
(In reply to comment #11)
> (In reply to comment #9)
> > This is great news. However, I just tried to upload an 814MB file ("WMF
> > Monthly
> > Metrics Meeting November 7, 2013.ogv") without success. After starting the
> > upload, the following status messages appear: 
> > 
> > 'Uploading...' --> 'Queued...' (for several minutes) --> 'Unknown error:
> > "unknown"' 
> > 
> > (using UploadWizard, chunked uploads enabled, tried in Chromium and Firefox.
> > Just noting this here for the moment, might file a separate bug later)
> 
> There are various reports about stashed upload being unreliable, and that
> unreliability increasing with number of chunks. See bug 3658

I mean bug 36587
Comment 13 Fastily 2013-11-12 02:20:36 UTC
Chunked uploads (using 4mb chunks) are no better over API.  The server consistently returns a 500 error when trying to upload 120mb files.  I find that this tends to be closely correlated with upload speed.  For example, uploading a 120mb file at 50mbps (using a corporate network) completely fails about 70% (7/10) of the time, whereas a uploading at 2mbps (using a typical home network) fails 100% (10/10) of the time.
Comment 14 Rainer Rillke @commons.wikimedia 2013-11-18 23:21:12 UTC
(In reply to comment #9)
> 'Uploading...' --> 'Queued...' (for several minutes) --> 'Unknown error:
> "unknown"' 

Does the file end up in [[Special:UploadStash]] after some time?
Comment 15 Fastily 2013-11-18 23:27:53 UTC
(In reply to comment #14)
> (In reply to comment #9)
> > 'Uploading...' --> 'Queued...' (for several minutes) --> 'Unknown error:
> > "unknown"' 
> 
> Does the file end up in [[Special:UploadStash]] after some time?

I didn't know about that special page :o  I'm going to check it out asap.  Thanks for sharing!
Comment 16 Tilman Bayer 2013-11-18 23:31:18 UTC
(In reply to comment #14)
> (In reply to comment #9)
> > 'Uploading...' --> 'Queued...' (for several minutes) --> 'Unknown error:
> > "unknown"' 
> 
> Does the file end up in [[Special:UploadStash]] after some time?

https://commons.wikimedia.org/wiki/Special:UploadStash currently tells me "You have no stashed files". I didn't check earlier (the error occurred on November 11).
Comment 17 Fastily 2013-11-19 02:03:38 UTC
(In reply to comment #14)
> (In reply to comment #9)
> > 'Uploading...' --> 'Queued...' (for several minutes) --> 'Unknown error:
> > "unknown"' 
> 
> Does the file end up in [[Special:UploadStash]] after some time?

I did a few test uploads, and it looks like the failed uploads do end up in [[Special:UploadStash]], but I'm unable to download & verify the contents of those files because the system "Cannot serve a file larger than 1048576 bytes."  Given this, it's hard to say what kind of issue this is (e.g. maybe the uploaded file is corrupt, i.e. file was not assembled properly server-side?)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links