Last modified: 2013-01-03 11:28:47 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T44548, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 42548 - Autocategorization by UploadWizard makes [[:commons:Category:Uploaded with UploadWizard]] uselessly large
Autocategorization by UploadWizard makes [[:commons:Category:Uploaded with Up...
Status: RESOLVED WONTFIX
Product: MediaWiki extensions
Classification: Unclassified
UploadWizard (Other open bugs)
master
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
http://commons.wikimedia.org/wiki/Cat...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-29 18:49 UTC by Russavia
Modified: 2013-01-03 11:28 UTC (History)
13 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Russavia 2012-11-29 18:49:18 UTC
Currently, all uploads to Commons using UploadWizard are categorised into:

http://commons.wikimedia.org/wiki/Category:Uploaded_with_UploadWizard

There are now some 2 million files in this category, which makes it almost impossible to browse due to the sheer number of files. It would be better if uploads were categorised by the month and year they were uploaded. e.g. "Category:Uploaded with UploadWizard (November 2012)" to make browsing of files easier.

Can we get UploadWizard amended so that this can occur in future. Current uploads can be re-categorised by one of our friendly Commons bot operators.
Comment 1 Rainer Rillke @commons.wikimedia 2012-11-29 21:26:43 UTC
The category is not of any use any more; at least this way (without a date). It was used for statistics, I think but now it only proves that MediaWiki can handle very large categories, but this is not Wikimedia Commons' scope.

Since I guess for Month-categories you have to change UpWiz' code and can't just change LocalSettings.php so I don't retarget this bug.
Comment 2 Nischay Nahata 2012-11-30 09:58:20 UTC
(In reply to comment #0)
Please enter reasonable summaries for bugs, a URL doesn't explain much. I have edited it now.

UW can be customized to upload to different categories by using autoCategories in UploadWizard.config.php so I guess this has been handled manually. Also it doesn't make much sense for UW to support month-wise categorization.

Maybe some commons admin can throw more light into what is needed.
Comment 3 Maarten Dammers 2012-12-14 21:17:18 UTC
First the Commons community should decide what it wants:
* Get rid of the category
* Keep the category split up by date
* Empty tag template
* (maybe another option)

Discussion should probably happen at https://commons.wikimedia.org/wiki/Category_talk:Uploaded_with_UploadWizard and you should advertise it at the village pump.

When you have consensus you should probably come back to bugzilla to have it implemented. For the meantime I would close this bug.
Comment 4 Bawolff (Brian Wolff) 2012-12-14 21:23:21 UTC
(In reply to comment #3)
> First the Commons community should decide what it wants:
> * Get rid of the category
> * Keep the category split up by date
> * Empty tag template
> * (maybe another option)

Logically, using an change tags for upload actions by upload wizard would make sense. Then users can browse by different dates in Special:log/upload filtered to just upload wizard uploads. I suppose what approach is best all depends on what the point of adding the category was, but thought I should mention that as another possibility (or something to do in addition to the category).
Comment 5 Ryan Kaldari 2012-12-14 21:28:45 UTC
I would favor getting rid of the category entirely. One of the reasons the category was created was actually to find bugs in UploadWizard when it was first being developed. For example, if a few image pages had a weird formatting problem and they both happened to be one of the 10,000 images in the 'Uploaded with UploadWizard' category, we knew the problem was an UploadWizard bug. Now the category is fairly useless for debugging purposes (and there are other ways we can see if an image is uploaded via UploadWizard anyway).
Comment 6 Russavia 2012-12-16 09:16:17 UTC
I have raised the issue for discussion at https://commons.wikimedia.org/wiki/Commons:Categories_for_discussion/2012/12/Category:Uploaded_with_UploadWizard -- I wouldn't expect consensus to take forever, so we could leave this report open for the time being?
Comment 7 Erik Moeller 2012-12-17 18:14:51 UTC
This category is still used for statistical purposes. For example, the "upload activity levels" table at http://stats.wikimedia.org/wikispecial/EN/TablesWikipediaCOMMONS.htm relies on the existence of this category. I don't see why the benefit of either removing or subdividing this category outweighs the cost.
Comment 8 Rainer Rillke @commons.wikimedia 2012-12-17 20:19:56 UTC
(In reply to comment #7)
I don't know why you still have the need to track. You've 70% now. This indicates that UpWiz is now usable :-) I still think it was wrong making a buggy tool default.

For anyone else who didn't recognize it, Erik's comment means WONTFIX.
Comment 9 Erik Moeller 2012-12-17 20:24:27 UTC
Actually, we were at 46.3% in the pre-WLM month, and then at 69.6% during WLM. It'll probably drop again to ~50% in October/November as the influx of new uploaders goes down. Tracking those types of changes over time, and seeing how they may be influenced e.g. by the integration of new features like the Flickr upload feature or improved WP integration, is precisely the point.
Comment 10 Rainer Rillke @commons.wikimedia 2012-12-17 21:37:21 UTC
(In reply to comment #7)
>used for statistical purposes
For statistical purposes, you can use the upload summary. At least there is no need that the category remains at the page forever? Can I build automated removal of that category in our clean-up regexps?
Comment 11 Erik Moeller 2012-12-17 22:00:08 UTC
My understanding is that Erik's scripts run each time against the whole dump, so if you removed the category after a wihle, that would mean the counts would change. We should give Erik time to clarify that.

It's possible that these could be switched over to parse the edit summary, provided the software adds and has always added the summary in the same way it's added the category. But tradeoffs and issues with that approach need to be analyzed first.

I still fail to see what's accomplished by doing this, as we also have other "Uploaded with/to" categories that have already become very large (Commonist is at >100K files, images from WLM2012 ist at >300K files). What's the problem we're trying to solve by removing or restructuring these categories? What, if any, operation is currently slowed down by these categories?
Comment 12 Platonides 2012-12-17 22:12:26 UTC
Ryan, what are those other ways to see if an image is uploaded via UploadWizard?

The category Category:Uploaded_with_UploadWizard is not suitable for navigation, but I don't see a need to remove it, either. It's purpose is to log the upload tool.
Comment 13 Erik Zachte 2012-12-18 11:41:57 UTC
Indeed, all dump stats are regenerated from scratch each time. This is on purpose. Provided the integrity of the dumps is maintained, and I think we're good there, that gives us most consistent metrics over time, as fixed bugs (few) and new metrics (occasional) yield improved or new data for all history. But it brings the caveat Erik described above.

If decision is taken to remove the category I can switch to the comment field.  

Alternative (e.g. when comment was not consistently set from the beginning) would be to have a script replace [[..uploadwizard..]] by an in-article comment 
<!--uploadwizard (do not remove this comment, for stats purposes)-->
Comment 14 Rainer Rillke @commons.wikimedia 2012-12-18 12:02:06 UTC
(In reply to comment #13)
Thanks for the reply. I guess at some point you'll stop gathering statistics. Just let us know when so we can remove this category when other stuff is done anyway (e.g. categorizing or i18n replacements). I don't think there is a pressing need to create more work now [there were times where the comment field was totally empty in UpWiz uploads; and now it is also not perfect] so I suggest, "resolved later"
Comment 15 Andre Klapper 2012-12-18 13:32:39 UTC
[Removing RESOLVED LATER as that is deprecated.]
Comment 16 Rainer Rillke @commons.wikimedia 2012-12-19 09:24:55 UTC
(In reply to comment #15)
> [Removing RESOLVED LATER as that is deprecated.]
There should be a message telling me this when selecting.

(In reply to comment #13)
> all dump stats are regenerated from scratch each time
This way your statistics are wrong. The cumulative risk for each single file to be deleted grows over the time. If you then generate statistics from the files (uploaded with the wizard) alive, it will always look like having a growing number of Upload Wizard uploads even if the number of Upload Wizard uploads remains constant. This effect is small but it's there. However, the percentage of Upload Wizard uploads may remain constant over time, if you computed the "total" numbers also from non-deleted files. But there is no prove that Upload Wizard uploads have the same chance getting deleted like any other uploaded file. Using the upload log from when Upload Wizard started using a special "upload summary" is more reliable.
Finally, I think one should attach some sources if required for these statistics and how you computed them next to each table/figure e.g. using footnotes. Often this is both important and interesting.
Comment 17 Ryan Kaldari 2013-01-03 01:24:23 UTC
@Platonides: The file upload comment says "User created page with UploadWizard", although this isn't as easy to exploit for statistical queries.

Looks like the opinion at the Commons discussion is split (as well as in the bug comments here). I'm going to go ahead and close this as WONTFIX. If stats.wikimedia.org stops relying on it or a consensus develops on Commons to delete it, feel free to reopen.
Comment 18 Bawolff (Brian Wolff) 2013-01-03 11:28:47 UTC
(In reply to comment #17)
> @Platonides: The file upload comment says "User created page with
> UploadWizard", although this isn't as easy to exploit for statistical
> queries.
>

hopefully we don't start using that for stats. Its questionable if its appropriate to have the img_comment be the same for all files uploaded with upload wizard (kind of defeats the point of having an img_comment field). Its not unimaginable that later versions of upload wizard could change that behaviour (and earlier versions of upload wizard didn't even have that behaviour) </offtopic rant>

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links