Last modified: 2014-01-03 15:46:40 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T61320, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 59320 - DBQ-65 Find most common hashes for Commons' deleted images
DBQ-65 Find most common hashes for Commons' deleted images
Status: RESOLVED FIXED
Product: Tool Labs tools
Classification: Unclassified
Database Queries (Other open bugs)
unspecified
All All
: Unprioritized major
: ---
Assigned To: Bugzilla Bug Importer (valhallasw)
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-01-03 15:46 UTC by Bugzilla Bug Importer (valhallasw)
Modified: 2014-01-03 15:46 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Bugzilla Bug Importer (valhallasw) 2014-01-03 15:46:35 UTC
This issue was converted from https://jira.toolserver.org/browse/DBQ-65.
Summary: Find most common hashes for Commons' deleted images
Issue type: Task - A task that needs to be done.
Priority: Major
Status: Done
Assignee: Mr.Z-man <mrzmanwikimail@gmail.com>

-------------------------------------------------------------------------------
From: Mike.lifeguard <mike.lifeguard+ts@gmail.com>
Date: Tue, 16 Jun 2009 18:31:09
-------------------------------------------------------------------------------

I'd like to have a list of the most common sha1 hashes for Commons' deleted images. Specifically, I'm interested in vandalistic images, so the query could be limited to  
a) those with \b(?:vandal(?:ism)?|goatse|shock image|...)\b in the delete reason and/or  
b) the last ~10,000 deletions or something (if that'd actually make a difference)

For output, I'd like a list of the top 100 sha1 hashes and a file name for each hash so I can find the file.

So far as I know, the hashes aren't stored for deleted files.
Comment 1 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:46:37 UTC
-------------------------------------------------------------------------------
From: DaB. <dab@ts.wikimedia.org>
Date: Fri, 19 Jun 2009 20:49:33
-------------------------------------------------------------------------------

You can find a list at http://toolserver.org/~dab/queries/zman1.txt, the query was 

SELECT CONCAT("# ", fa_storage_key, "[[:image:",fa_name,"]] "), COUNT(fa_storage_key) as anzahl FROM filearchive WHERE fa_archive_name IS NULL AND fa_deleted_timestamp>"20080000000000" AND fa_storage_key!="" GROUP BY fa_storage_key ORDER BY anzahl DESC LIMIT 100;

It listed the storage_key (that's a sha1-hash in a mediawikiformat), a image-name of a deleted picture with this key, and the count for all pictures that was delete after 1.1.2008.

I hope it help you.
Comment 2 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:46:39 UTC
-------------------------------------------------------------------------------
From: Mr.Z-man <mrzmanwikimail@gmail.com>
Date: Fri, 19 Jun 2009 20:55:35
-------------------------------------------------------------------------------

The hashes are stored for deleted files, concatenated with the file extension as fa_storage_key in the filearchive table.

#### Most common of all deleted files:

Query: 
    
    SELECT SUBSTRING(fa_storage_key, 1, 31) AS hash, COUNT(*), fa_name FROM filearchive 
    WHERE fa_storage_key IS NOT NULL AND fa_storage_key != '' 
    GROUP BY hash ORDER BY COUNT(*) DESC LIMIT 100;

  
Results: http://toolserver.org/~alexz/DBQ-65/all.txt

#### From the last 6 months:

Query: 
    
    SELECT SUBSTRING(fa_storage_key, 1, 31) AS hash, COUNT(*), fa_name FROM filearchive 
    WHERE fa_storage_key IS NOT NULL AND fa_storage_key != '' AND fa_deleted_timestamp > 20090119000000 
    GROUP BY hash ORDER BY COUNT(*) DESC LIMIT 100;

  
Results: http://toolserver.org/~alexz/DBQ-65/recent.txt

#### Vandalism noted in deletion reason:

Query: 
    
    SELECT SUBSTRING(fa_storage_key, 1, 31) AS hash, COUNT(*), fa_name FROM filearchive 
    WHERE fa_storage_key IS NOT NULL AND fa_storage_key != '' AND fa_deleted_reason RLIKE "[[:<:]](vandal(ism)?|goatse|shock)[[:>:]]"
    GROUP BY hash ORDER BY COUNT(*) DESC LIMIT 100;

  
Resuts: http://toolserver.org/~alexz/DBQ-65/vandal.txt

#### Recent vandalism deletions:

Query: 
    
    SELECT SUBSTRING(fa_storage_key, 1, 31) AS hash, COUNT(*), fa_name FROM filearchive 
    WHERE fa_storage_key IS NOT NULL AND fa_storage_key != '' AND fa_deleted_timestamp > 20090119000000 AND fa_deleted_reason RLIKE "[[:<:]](vandal(ism)?|goatse|shock)[[:>:]]" 
    GROUP BY hash ORDER BY COUNT(*) DESC LIMIT 100;

  
Results: http://toolserver.org/~alexz/DBQ-65/recentvandal.txt
Comment 3 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:46:40 UTC
This bug was imported as RESOLVED. The original assignee has therefore not been
set, and the original reporters/responders have not been added as CC, to
prevent bugspam.

If you re-open this bug, please consider adding these people to the CC list:
Original assignee: (none)
CC list: mike.lifeguard+bugs@gmail.com, wikimedia-bugzilla@dabpunkt.eu

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links