Last modified: 2014-06-26 19:23:27 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T65864, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 63864 - GWToolset processes just three files and then nothing
GWToolset processes just three files and then nothing
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
GWToolset (Other open bugs)
unspecified
All All
: High major (vote)
: ---
Assigned To: dan
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-04-12 19:09 UTC by Jean-Fred
Modified: 2014-06-26 19:23 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
7 records for Musée Départemental Albert Demard (14.41 KB, text/xml)
2014-04-13 03:08 UTC, dan
Details
json mapping used during test upload (680 bytes, application/json)
2014-04-13 03:11 UTC, dan
Details
json mapping based on the one i found on the beta cluster for this dataset (1.20 KB, application/json)
2014-04-13 14:41 UTC, dan
Details

Description Jean-Fred 2014-04-12 19:09:54 UTC
Tounoki launched the GWToolset with an XML holding a few dozen records ; the GWToolset claims it created a background task to process the batch ; but only the first three files are processed and uploaded.

This occurred twice.

See Special:ListFiles/Tounoki [1] for these two attempts.


[1] <https://commons.wikimedia.org/wiki/Special:ListFiles/Tounoki>
Comment 1 Bawolff (Brian Wolff) 2014-04-12 19:14:51 UTC
Could you attach the xml file to the bug (or link to it).

Next step for this bug would probably be to get someone with access to job queue log to see what happened to those jobs.
Comment 2 tounoki 2014-04-12 21:55:54 UTC
First try april 11. 3 files loaded https://commons.wikimedia.org/wiki/Special:ListFiles/Tounoki from 10:50 to 10:58

Second try, april 12. between 7:47 and 8:07

https://commons.wikimedia.org/wiki/File:Sabot-M0354002647.tif is the first file of this second try and get to a timeout page.

I start again the transfert by refreshing the page and the both following records with the same comment "support DV5_M0354_2006_7 complet" are the second try with the same xml file as source.

For this two trys, only 3 records are loaded each time and even if the GWtoolset tell there is a background task, but nothing happened.
Comment 3 tounoki 2014-04-12 22:10:49 UTC
For the second try, I use the same xml file with 3 records less (the three that are aleady loaded)
Comment 4 dan 2014-04-13 03:08:01 UTC
Created attachment 15092 [details]
7 records for Musée Départemental Albert Demard
Comment 5 dan 2014-04-13 03:11:02 UTC
Created attachment 15093 [details]
json mapping used during test upload
Comment 6 dan 2014-04-13 04:04:55 UTC
localhost test
--------------
i ran a test upload with the attached xml based on the 7 items that did make it 
up to commons and the attached json mapping.

• the mediafile file sizes are larger than earlier uploads; > 60mb vs < 3mb 
  based on the information on this page: 
  https://commons.wikimedia.org/wiki/Special:ListFiles/Tounoki.
• based on what i currently understand about upload limits, commons should 
  accept up to 100mb for form uploads and up to 1000mb for 
  background job downloads.
• it took approximately 9 minutes to upload the first 3 items as a preview.
• the background job took approximately 12 minutes to complete the remaining
  4 items.


wikilabs
--------
http://gwtoolset.wmflabs.org/wiki/Category:Mus%C3%A9es_d%C3%A9partementaux_de_la_Haute-Sa%C3%B4ne

• the preview upload took approximately 1 minute
• the remaining items took approximately 3 minutes


beta cluster test
-----------------
http://commons.wikimedia.beta.wmflabs.org/wiki/Category:Mus%C3%A9es_d%C3%A9partementaux_de_la_Haute-Sa%C3%B4ne

• the preview failed with the following message, so i couldn’t process the 
  batch, but the first 3 items did upload as can be seen in the link above.
  
Our servers are currently experiencing a technical problem. This is probably 
temporary and should be fixed soon. Please try again in a few minutes.

If you report this error to the Wikimedia System Administrators, please include 
the details below.
Request: POST http://commons.wikimedia.beta.wmflabs.org/wiki/Special:GWToolset, 
from 127.0.0.1 via deployment-cache-text02 deployment-cache-text02 
([127.0.0.1]:3128), Varnish XID 2001837297
Forwarded for: 84.85.134.252, 127.0.0.1
Error: 503, Service Unavailable at Sun, 13 Apr 2014 03:37:12 GMT


moving forward
--------------
• is there a way to find out if there’s a timeout limit on:
  • form uploads
  • each job queue jobs
• how can we alter those timeouts for the toolset?
Comment 7 Bawolff (Brian Wolff) 2014-04-13 04:38:26 UTC
> • it took approximately 9 minutes to upload the first 3 items as a preview.

Thats a lot of time. Maybe the preview thing should change so that if the first upload took say > 45 seconds, to only do 1 file for the preview.


> • the background job took approximately 12 minutes to complete the remaining
>   4 items.
> 
> 
> wikilabs
> --------
> http://gwtoolset.wmflabs.org/wiki/Category:
> Mus%C3%A9es_d%C3%A9partementaux_de_la_Haute-Sa%C3%B4ne
> 
> • the preview upload took approximately 1 minute
> • the remaining items took approximately 3 minutes
> 
> 
> beta cluster test
> -----------------
> http://commons.wikimedia.beta.wmflabs.org/wiki/Category:
> Mus%C3%A9es_d%C3%A9partementaux_de_la_Haute-Sa%C3%B4ne
> 
> • the preview failed with the following message, so i couldn’t process the 
>   batch, but the first 3 items did upload as can be seen in the link above.
>   
> Our servers are currently experiencing a technical problem. This is probably 
> temporary and should be fixed soon. Please try again in a few minutes.
> 
> If you report this error to the Wikimedia System Administrators, please
> include 
> the details below.
> Request: POST
> http://commons.wikimedia.beta.wmflabs.org/wiki/Special:GWToolset, 
> from 127.0.0.1 via deployment-cache-text02 deployment-cache-text02 
> ([127.0.0.1]:3128), Varnish XID 2001837297
> Forwarded for: 84.85.134.252, 127.0.0.1
> Error: 503, Service Unavailable at Sun, 13 Apr 2014 03:37:12 GMT
> 
> 
> moving forward
> --------------
> • is there a way to find out if there’s a timeout limit on:
>   • form uploads

All web requests have a timeout. Php itself may have an execution time limit (although to be honest i dont usually here about that limit. I think its higher than the other limits). Varnish has a time limit (thats the error you got on beta. Although it should be noted beta is configured a bit different from commons). I believe also that the ssl proxy servers have a timeout too (which sounds like the error described in bug 63818 comment 4).

Im not sure what the timeout is, but i think its to the tune of 120 seconds.

Timeouts are complicated by the fact that the file code tries to extend php timeouts well uploading a file (i think)
>   • each job queue jobs
i believe each job has a timeout but its much more liberal. More to the tune of an hour. (Really not sure on this part)
> • how can we alter those timeouts for the toolset?

Im not sure if altering the varnish timeouts is an option. (There are however things that can be done to get around it if it turns out to be a big issue. E.g. splitting up the operation among multiple requests and using js to make it look like 1, pushing stuff to jobs like upWiz does, etc)
Comment 8 tounoki 2014-04-13 11:19:31 UTC
(In reply to Bawolff (Brian Wolff) from comment #1)
> Could you attach the xml file to the bug (or link to it).
> 

The xml file contains 36 records at the beginning.
Comment 9 dan 2014-04-13 14:41:01 UTC
Created attachment 15095 [details]
json mapping based on the one i found on the beta cluster for this dataset
Comment 10 dan 2014-04-13 14:52:41 UTC
i’m getting the impression that we need to alter the preview step so that it 
can deal with large size mediafiles; e.g., > 3mb.

at the moment, i think i might be best to eliminate the upload of the mediafile
and only upload the metadata and display a preview of that. would also test the 
url at this step and make sure its valid and reachable by the toolset, and give 
any error feedback to the user in case the domain name needs to be added to
the whitelist or something else. then, once the process batch job button is
clicked, allow the background job to actually download the mediafile to the
wiki.
Comment 11 tounoki 2014-04-13 15:02:21 UTC
(In reply to dan from comment #9)
> Created attachment 15095 [details]
> json mapping based on the one i found on the beta cluster for this dataset

I used this : https://commons.wikimedia.org/wiki/GWToolset:Metadata_Mappings/Tounoki/JOCONDE_M0354-CHAMPLITTE.json
Comment 12 tounoki 2014-04-13 15:09:00 UTC
Focus on the data mapping for preview and just/only test the accessibility of the files (instead of upload) seems to be a good way for me.
In fact, see the pictures isn't the most important thing with this kind of work.

If you can fix it, I'm ready to test it on monday afternoon.
Comment 13 tounoki 2014-04-13 15:11:54 UTC
Or maybe preview can be an option ? (with possibility to be excluded for large size files)
Comment 14 Gerrit Notification Bot 2014-04-22 00:19:29 UTC
Change 127839 had a related patch set uploaded by Dan-nl:
preview without upload

https://gerrit.wikimedia.org/r/127839
Comment 15 dan 2014-04-22 01:06:34 UTC
tounoki,

if you have access to http://gwtoolset.wmflabs.org/wiki/GWToolset you can test the above patch on that server. if you don’t, create an account and i will grant you access.

i tested the patch on that server with the files attached to this bug, and the upload of the mediafiles to the wiki succeeded; it took about 20 minutes. you can see the results here: http://gwtoolset.wmflabs.org/wiki/Category:Mus%C3%A9e_D%C3%A9partemental_Albert_Demard.

after a quick glance, the only “issue” i noticed, is that the wiki doesn’t create a thumbnail for the tif. it looks like i would need to adjust http://www.mediawiki.org/wiki/Manual:$wgTiffThumbnailType. would that help with testing? if so, do you know if that’s the correct and only value i need to change? which values would be best to use in that array?

in any case, let me know if you think this patch will take care of this bug. also, feel free to +2 it or let me know if you think it needs adjustment.
Comment 16 Bawolff (Brian Wolff) 2014-04-22 02:32:19 UTC
(In reply to dan from comment #15)
> tounoki,
> 
> if you have access to http://gwtoolset.wmflabs.org/wiki/GWToolset you can
> test the above patch on that server. if you don’t, create an account and i
> will grant you access.
> 
> i tested the patch on that server with the files attached to this bug, and
> the upload of the mediafiles to the wiki succeeded; it took about 20
> minutes. you can see the results here:
> http://gwtoolset.wmflabs.org/wiki/Category:
> Mus%C3%A9e_D%C3%A9partemental_Albert_Demard.
> 
> after a quick glance, the only “issue” i noticed, is that the wiki doesn’t
> create a thumbnail for the tif. it looks like i would need to adjust
> http://www.mediawiki.org/wiki/Manual:$wgTiffThumbnailType. would that help
> with testing? if so, do you know if that’s the correct and only value i need
> to change? which values would be best to use in that array?
> 
> in any case, let me know if you think this patch will take care of this bug.
> also, feel free to +2 it or let me know if you think it needs adjustment.


We use PagedTiffHandler extension on wmf servers instead of mediawiki built in tiff handling.
Comment 17 dan 2014-04-22 07:17:14 UTC
(In reply to Bawolff (Brian Wolff) from comment #16)
> We use PagedTiffHandler extension on wmf servers instead of mediawiki built
> in tiff handling.

thanks. installed installed it and it created the thumbnails.
Comment 18 dan 2014-04-28 21:20:10 UTC
steps to reproduce
==================
login
-----
1. http://commons.wikimedia.beta.wmflabs.org/wiki/Special:GWToolset
2. once logged in and at Step 1: Metadata detection

step 1
------
1. nothing to add
2. select Art_Photo
3. GWToolset:Metadata_Mappings/Tounoki/temp_JOCONDE_CHAMPLITTE.json
4. nothing to add
5. choose the attached sample dataset, 
   7 records for Musée Départemental Albert Demard
6. click Submit

step 2
------
1. click the "Preview batch" button
Comment 19 Gerrit Notification Bot 2014-05-08 10:13:26 UTC
Change 127839 merged by jenkins-bot:
preview without upload

https://gerrit.wikimedia.org/r/127839
Comment 20 dan 2014-05-08 18:13:06 UTC
tounoki and jean fred,

you should be able to test this patch on the beta cluster, 
http://commons.wikimedia.beta.wmflabs.org/wiki/Special:GWToolset. 

please let me know if it resolves this bug and bug 63818.
Comment 21 tounoki 2014-05-18 22:22:00 UTC
Yes it works :D thanks
http://commons.wikimedia.beta.wmflabs.org/wiki/Special:ListFiles/Tounoki

When do you think you can commit on the main WC server ?

BR
Comment 22 dan 2014-05-20 02:25:32 UTC
it should be part of today's deploy, so it should be available on the production server tomorrow.

== concern ==
one concern though is that the image scalers are having issues with creating thumbnails for large tiffs when several are submitted in sequence, which is what GWToolset does. the multimedia team has a potential solution, but i haven’t heard of any definite fix yet ( see bug 65217 ).

when you do upload, please coordinate the upload with the wmf operations team in irc, #wikimedia-operations.

thanks!
Comment 23 dan 2014-06-13 09:47:00 UTC
this has been deployed to production. tounoki, jean-fred, are you okay with resolving the ticket as fixed now?
Comment 24 tounoki 2014-06-26 17:24:46 UTC
(In reply to dan from comment #23)
> this has been deployed to production. tounoki, jean-fred, are you okay with
> resolving the ticket as fixed now?

Ok to mark it as fixed
Comment 25 Andre Klapper 2014-06-26 19:23:27 UTC
(In reply to tounoki from comment #24)
> Ok to mark it as fixed

Doing so.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links