Last modified: 2014-11-15 00:47:07 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T31640, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 29640 - need an easy way to see if a filename is in use via the API (including InstantCommons)


Summary:	need an easy way to see if a filename is in use via the API (including Instan...

Status:	NEW

Product:	MediaWiki
Classification:	Unclassified
Component:	API (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Lowest enhancement (vote)
Target Milestone:	---
Assigned To:	Roan Kattouw

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2011-06-29 06:26 UTC by Ryan Kaldari
Modified:	2014-11-15 00:47 UTC (History)
CC List:	10 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Ryan Kaldari 2011-06-29 06:26:41 UTC

Right now, it seems that the only way to check if a file name is already in use on Wikipedia is to do api.php?action=query&titles=<imageTitle>&prop=imageinfo

Then you have to do the following awkward test on the results:
if ( !data.query.pages[-1] || data.query.pages[-1].imageinfo) { // image exists

The reason for this is that the API returns a page id for a local image, or -1 for no image. However, it also returns -1 if it finds an image through InstantCommons.

It would be nice if there was a simpler way to do this. Maybe prop=filenameinuse that just returns true or false. There are two uses for this: 1) Making sure that an image someone wants to use for something on-wiki actually exists before they use it. 2) Making sure that uploading scripts don't upload over other files.

Comment 1 Roan Kattouw 2011-06-29 09:05:52 UTC

I don't fully understand the feature request here. According to your own comment checking for image existence is kind of counter-intuitive, but it really only requires two tests. In fact, you can do it with one:

* append &indexpageids=1 to the URL
* Use something like if ( 'imageinfo' in data.query.pages[data.query.pageids[0]] )

The reason here is because the image can either have its real page ID if the description page exists, or -1 if the description page doesn't exist. It's perfectly possible (but atypical) to have an orphaned ocal image with no description page, in which case you'll also get the -1 behavior.

The &indexpageids=1 trick adds data.query.pageids as an array of page IDs used (specifically for the benefit of JS users), which allows you to be agnostic as to whether you get a real or a fake page ID. It will always be present in the result (even if you passed an invalid title) and it will always contain exactly one element if you ask for exactly one page.

IMO this is a WONTFIX.

Comment 2 Brion Vibber 2011-06-29 19:09:52 UTC

I think the issue here is that imageinfo results shouldn't really be returned associated with a *page id* to begin with, as there may or may not be a local wiki page associated with the file.

When querying info on multiple images at once, you seem to end up with a series of negative indexes -1, -2, etc:

http://en.wikipedia.org/w/api.php?action=query&titles=File:North_Caucasus_topographic_map-fr.svg|File:Caucasus_Region_26-08-08.PNG&prop=imageinfo&format=jsonfm

Now there may just not be a good alternate way to fit this into the output format of the highly page-centric query actions, but it feels kinda awkward.

The advantage of it is that you *can* extend it easily from querying one file to querying multiple files since you can iterate over the same structure; but you need to know that the returned page ids will often be useless (for instance you can't save that -2 page id and use it in another lookup to get other info about the image, you need to use the title).

What I usually do rather than hardcoding a -1 check (unsafe!) is to simply iterate over the collection, knowing it may contain either 0 or 1 items, and that the 1 item may or may not actually have image info to return:

var imageinfo = false;
if (data && data.pages) {
  $.each(data.pages, function(i, page) {
    if ('imageinfo' in page) {
      imageinfo = page.imageinfo;
    }
  });
}
if (imageinfo) {
  // do something with it
} else {
  // didn't find a matching image
}

This is then fairly easy to extend to handle multiple lookups; as you go through each row check its title to know which one you're dealing with.

Comment 3 Ryan Kaldari 2011-06-29 20:00:42 UTC

Yeah, the problem is that action=query is really designed to work with local pages, not transcluded files. I agree that there are ways to accomplish the goal, but none of them take less than half an hour to figure out (unless you happen to be Roan or Brion). If a volunteer is just coding something for fun, they may not have the patience to work it out. Most likely, they will write a test that doesn't cover all of the cases and they'll end up with a buggy tool.

While researching this, I learned that a lot of people still use an old pre-API AJAX method to accomplish this task:
sajax_do_call( 'SpecialUpload::ajaxGetExistsWarning', $filename, function() );

It would be nice if these old ajax.js methods were rewritten into API calls that were just as easy to use. Is it even safe to keep using the ajax.js methods or will they be deprecated at some point?

Comment 4 Bryan Tong Minh 2011-06-29 20:26:13 UTC

No, you should not rely on the continuing existance of action=ajax. I think ajaxGetExistsWarning is the last thing that prevents us from killing it all together.

Comment 5 Chad H. 2011-06-29 20:30:58 UTC

(In reply to comment #4)
> No, you should not rely on the continuing existance of action=ajax. I think
> ajaxGetExistsWarning is the last thing that prevents us from killing it all
> together.

Nothing in core still uses action=ajax at all...we're only hanging on to it due to the plethora of extensions that still use it. However you're right....under NO CIRCUMSTANCES WHATSOEVER should you continue to use action=ajax.

Comment 6 Ryan Kaldari 2011-06-29 21:35:21 UTC

The beauty of the ajax.js method is that it gives you a simple yes or no
answer. There are many other cases as well in which the API user simply wants a
true or false result rather than a tree of complex data to parse. This is one
reason why so many Bot writers use 3rd party API frameworks to interact with
MediaWiki rather than using our API directly. Perhaps one day we could
implement an action=check API method that is just for doing simple checks that
return boolean results.

action=check&prop=filenameexists...
action=check&prop=filehashexists...
action=check&prop=useremailable...

...especially since most of the action=query methods have a weird way of
expressing boolean values (empty string for true, undefined for false).

Comment 7 Quim Gil 2014-04-01 20:20:57 UTC

Setting as "lowest", although looking at the discussion it might even be a WONTFIX? Unless something different has happened in the past three years.

Comment 8 Umherirrender 2014-11-14 18:41:16 UTC

Is this alreay fixed? With prop=imageinfo you get back a "imagerepository" property, which contains the name of the repo or an empty string. Set the iiprop to an empty string to avoid too many extra data.

Usually values for wmf wikis are "local" or "shared", because the config for wmf wikis calls commons shared[1]. Using InstantCommons the content is "wikimediacommons" for foreign images. Using json you have still the issue with the negative page ids, but with indexpageids= you can get a array back where the pageids in the numeric order to get the values (see comment above for the implementation)

[1] https://en.wikipedia.org/w/api.php?action=query&meta=filerepoinfo

Comment 9 Ryan Kaldari 2014-11-15 00:34:17 UTC

Umherirrender: I don't think you read the bug description. Yes, there is a way to get the information, but it is extremely convoluted.

Comment 10 Ryan Kaldari 2014-11-15 00:47:07 UTC

What I want is:
api.php?action=filenameinuse&filename=<filename>

which returns:
{ "filenameinuse": true }
...or...
{ "filenameinuse": false }

It doesn't even need to handle multiple filenames.

This is an extremely common use case for bots (and pretty much any software that uploads images to Commons), so we should make this trivially easy.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links