Last modified: 2014-08-07 18:53:33 UTC
GWT should prevent the upload of duplicates.
What would be *great* would be if the GWT were to skip duplicates but complete the requested run, and then report back on SHA-1 duplicates, possibly supplying an xml exceptions file (of <records>) with only the duplicates in it, and preferably with the filename of the duplicated file(s) found in an extra field. If the user then had the option of setting a flag to force the creation of duplicates at that point, using the xml exceptions file, at least they would be wholly responsible for their actions, could add a "(duplicate check needed)" backlog category as appropriate, and should expect to deal with the duplicates themselves, rather than putting this on other random volunteers.
Change 132751 had a related patch set uploaded by Siebrand: Don’t allow upload of duplicate mediafiles https://gerrit.wikimedia.org/r/132751
Created attachment 15350 [details] test metadataset
steps to reproduce ================== notice current item ------------------- 1. notice how many mediafiles are present for this item and take note as to whether or not they are the same: http://commons.wikimedia.beta.wmflabs.org/wiki/File:Een_vrouw_brengt_een_offer_aan_Priapus_()-Sc%C3%A8nes_uit_Vergilius_dichtbundel_Bucolica_(serietitel)-RP-P-1992-80-RM0001.COLLECT.70.jpeg login ----- 1. http://commons.wikimedia.beta.wmflabs.org/wiki/Special:GWToolset 2. once logged in and at Step 1: Metadata detection step 1 ------ 1. nothing to add 2. select Artwork 3. GWToolset:Metadata Mappings/Dan-nl/Rijksmuseum.json 4. nothing to add 5. choose the attached “test metadataset” 6. click Submit step 2 ------ 1. check “Re-upload media from URL” 2. click the "Preview batch" button step 3 ------ click the “Process batch” button note the item change -------------------- 1. there should be yet another copy of the same mediafile http://commons.wikimedia.beta.wmflabs.org/wiki/File:Een_vrouw_brengt_een_offer_aan_Priapus_()-Sc%C3%A8nes_uit_Vergilius_dichtbundel_Bucolica_(serietitel)-RP-P-1992-80-RM0001.COLLECT.70.jpeg
steinsplitter, this has been deployed to production. are you okay with marking it as resolved fixed?
steinsplitter, a patch has been deployed to production that addresses this issue. are you okay with closing this bug now?
Thank you