Last modified: 2013-07-25 10:31:45 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T39455, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 37455 - Implement image tracking in the monuments database
Implement image tracking in the monuments database
Status: NEW
Product: Wiki Loves Monuments
Classification: Unclassified
Database (Other open bugs)
unspecified
All All
: Low enhancement
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-10 19:50 UTC by Maarten Dammers
Modified: 2013-07-25 10:31 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Maarten Dammers 2012-06-10 19:50:58 UTC
The monuments database should contain a table with all the images at Commons which have a valid identifier.

The code to extract the identifiers is already available in the unused images bot (https://fisheye.toolserver.org/browse/erfgoed/erfgoedbot/unused_monument_images.py?hb=true).

The bot should get valid template/tracker categories from the configuration (https://fisheye.toolserver.org/browse/erfgoed/erfgoedbot/monuments_config.py?hb=true)

Loop over these and for each source get all the images + metadata.
Comment 1 AleXXw 2012-06-10 20:40:47 UTC
The database should at least contain:
* Filename
* Monuments ID as in templates
* Uploader
* Upload date
* Is there {{Wiki Loves Monuments yyyy}} and year
Optional:
* coordinates
* categories
* image resolution
* file size
Comment 2 Maarten Dammers 2012-10-01 20:08:14 UTC
Probably best to start with a minimal implementation where wikitext parsing is not needed. Existing tools can be converted to make use of this table. Later this information can be extended.
Comment 3 Maarten Dammers 2012-12-01 21:42:58 UTC
Did a first implementation. 

 CREATE TABLE `image` (
  `country` varchar(10) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '',
  `id` varchar(25) NOT NULL DEFAULT '0',
  `img_name` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '',
  PRIMARY KEY (`country`,`id`,`img_name`),
  KEY `country_id` (`country`,`id`),
  KEY `img_name` (`img_name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8

mysql> SELECT COUNT(*) FROM image;
+----------+
| COUNT(*) |
+----------+
|   878926 |
+----------+
1 row in set (0.00 sec)

Playing around with api at http://toolserver.org/~multichill/monapi/api.php?action=images&country=pl&id=MA/A-1028&format=json&width=2000&limit=999999
Comment 4 AleXXw 2012-12-02 11:29:53 UTC
Great start, thx Maarten!

Beside the additional wished fields I noticed some errors, most should be no problem for you ;)

* IDs are filled with zeroes (28985 appears as 00028985)
* IDs got uppercase (ArD-9-006 appears as ARD-9-006)
* some IDs are the uppercase image name (ie 'WIGANDG 29.JPG')
* Pictures with more than one ID-template are just once in the table, but should be as often as there are templates (ie http://commons.wikimedia.org/wiki/File:Murtalbahnbr%C3%BCcke_1.JPG)
Comment 5 Maarten Dammers 2012-12-02 20:18:50 UTC
I think I've fixed this zero problem already, just haven't updated the database yet. Same for the lowercase/uppercase thing. I'm struggling a bit with how to do the padding on Commons. Currently it's padded with '0', but this is causing problems in the USA. The NRHP uses the first two characters for the year so all the nominations in 2000 (00xxxx) get their two zero's chopped off.

I use the categorylinks table for the id's. Multiple templates adding the same category just gives one entry, so that's what I'm using.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links