Last modified: 2014-06-11 17:33:24 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T60993, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 58993 - Make limited information from filearchive available to everyone
Make limited information from filearchive available to everyone
Status: NEW
Product: MediaWiki
Classification: Unclassified
API (Other open bugs)
1.23.0
All All
: Low enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-12-27 15:31 UTC by Rainer Rillke @commons.wikimedia
Modified: 2014-06-11 17:33 UTC (History)
12 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Rainer Rillke @commons.wikimedia 2013-12-27 15:31:46 UTC
Original bug title:
Make limited information from filearchive available to everyone

Reasoning:
When it comes to identifying copyright violations and [[WP:Sock puppetry]], it is essentially helpful if you can check whether a file has been previously uploaded without uploading the file into the stash yourself.

Demand:
title and size, filterable by sha1
( fasha1=HEXHASH&faprop=title|size )

What about privacy?
Not an issue. If you upload to a file to the stash, you are able to obtain this information anyway.
Comment 1 Luis Villa (WMF Legal) 2013-12-30 21:59:09 UTC
Dumb question: what's the use case for this? (Ideally I'd also like to understand the use case for the existing functionality as well, but one thing at a time...)
Comment 2 Rainer Rillke @commons.wikimedia 2013-12-31 02:43:50 UTC
(In reply to comment #1)
> Dumb question: what's the use case for this?
see Reasoning. +Let me give you 3 examples:

User uploads copyright violation. Patroller marks file for deletion. Admin deletes file. User uploads same file again. Patroller can now sha1lookup whether a similar file did exist before at https://commons.wikimedia.org/w/index.php?title=Commons:User_scripts/File_Analyzer&withJS=MediaWiki:FileAnalyzer.js
and identify the user(s) who uploaded that file.

Bot coder and bot are not administrators. Bot uploads a batch of very huge files. But some were previously deleted and should not be uploaded again. Bot could check SHA1 before uploading to save bandwidth.

File is marked for transfer from en.wikipedia to Commons. Bot/Tool could check whether this file was previously deleted at Commons and refuse the transfer.

...
Please let me know if this was convincing enough or whether you would like to get more feedback from Commons users. Or are you asking for a technical explanation of SHA1 and that kind of stuff? Sorry, here at bugzilla, it's always a bit difficult to get it right because I never know to whom I am talking without googleing.
Comment 3 Andre Klapper 2013-12-31 10:32:03 UTC
(In reply to comment #2)
> I never know to whom I am talking without googleing.

I've bookmarked https://wikimediafoundation.org/wiki/Staff?showall=1 for that :)
Comment 4 Luis Villa (WMF Legal) 2013-12-31 18:46:37 UTC
Examples were perfect, thanks - understand the use case much better now.

I'm fine with this from a privacy perspective, as long as it respects suppression of titles (which should also be respected if you do a full file upload - I understand that isn't currently the case, have filed bug 59167 for that.)

[Also, I've tweaked my settings to say a little bit about who I am, hope that helps (though I suppose that might make you *more* likely to explain SHA1, which I definitely don't need!) ]
Comment 5 2014-05-06 10:53:10 UTC
From a (non-sysop) bot writing perspective, it would be great to be able to get an array of previous deletions for an queried SHA-1. At the moment pywikipediabot passes back a name of a matching file, but not all matches.

I suggest that the deleted file names are passed back (incredibly useful info when these contain reference numbers from the original source, such as Flickr photo ids) *unless* there were a reason to suppress the filename from the deletion log. Other basic information (dates, uploader, editors) would be great for a bot to take action on, or make decisions about. Scenarios include a bot taking different actions based on whether it sees its name as a past uploader or whether upload dates fall within the dates of a recent batch upload project.

There may be privacy issues on some data elements (such as listing all past editors or uploaders), however I think we should expect to be able to automatically distinguish between ordinary deleted material (such as copyvios) and files which were deleted due to respect/privacy concerns.
Comment 6 Silke Meyer (WMDE) 2014-06-11 17:33:24 UTC
Not related to 58791, removing dependency.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links