Last modified: 2013-08-08 16:39:11 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T52948, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 50948 - Implement custom settings for image licenses used for PDF generation (currently skips images marked as "fair use")


Summary:	Implement custom settings for image licenses used for PDF generation (current...

Status:	NEW

Product:	MediaWiki extensions
Classification:	Unclassified
Component:	Collection (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Normal enhancement (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2013-07-08 15:54 UTC by Betacommand
Modified:	2013-08-08 16:39 UTC (History)
CC List:	7 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Betacommand 2013-07-08 15:54:45 UTC

when creating PDF files many files are missing see https://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_%28technical%29&oldid=563383020#Infobox_images_missing_from_PDFs

Comment 1 Greg Grossmeier 2013-08-06 18:39:13 UTC

From that VPT section:
I think you'll find it has left out the "fair use" images. That's probably a deliberate design choice, though I can't immediately find it documented anywhere. -- John of Reading (talk) 15:50, 7 July 2013 (UTC)


Betacommand: is that the case? Do you only see this happening when the images are used on WP under a fair use claim? My guess is the vast majority (99%) of images in the category referenced (marvel comics related) are used under a fair use claim. Can you reproduce on another article that doesn't have any fair use images?

Comment 2 Greg Grossmeier 2013-08-06 20:09:11 UTC

Betacommand tells me on IRC that yes, this only occurs when the images are marked as Fair Use. Retitling bug as such. This may be a WONTFIX issue. I'll let the extension authors/those interested weigh in.

Comment 3 Volker Haas 2013-08-07 08:40:47 UTC

Yes, fair use images are not included in the PDFs on purpose.

Comment 4 Betacommand 2013-08-07 10:22:29 UTC

there needs to be a way to override the removal of those files

Comment 5 Volker Haas 2013-08-07 11:52:27 UTC

It is not possible to include fair use images in PDFs due to potential copyright issues. If you want to generate a PDF for personal use, you could install the PDF rendering software (mwlib and mwlib.rl) on your local machine and disable filtering of images.

Comment 6 Betacommand 2013-08-07 12:34:12 UTC

How is it any more of a copyright issue than serving the existing article? If a user decides they want to include the files there should be an option to override the filtering. It wouldnt affect the default process but would enable better offline access.

Comment 7 Greg Grossmeier 2013-08-07 16:59:20 UTC

I'm going to pre-emptively nip this copyright conversation in the bud, at least the aspect of having it in the bug tracker.


!!!! Please do not debate the relative merits of any interpretation of copyright law in the bug tracker. !!!!


The local wiki is the correct place to have this discussion as that is where the various lists of excluded categories/templates live (per wiki).

Comment 8 Betacommand 2013-08-07 17:03:24 UTC

This "Feature" isn't controlled at the local level. This was a <s>feature</s> Bug that was created by developers, implemented by developers, without ever asking local users. 

What would be great would be a parameter that could be passed to this process that overrode the image filters.

Comment 9 Greg Grossmeier 2013-08-07 17:20:46 UTC

Volker: can you comment on the history of this decision (to add that functionality)? Maybe pointing to any public discussion?

Comment 10 MZMcBride 2013-08-08 04:42:59 UTC

(In reply to comment #9)
> Volker: can you comment on the history of this decision (to add that
> functionality)? Maybe pointing to any public discussion?

Greg: have you tried looking up the relevant code? It would be helpful if someone could paste the code here, perhaps with the accompanying SVN revision or Git commit. :-)

Comment 11 Volker Haas 2013-08-08 09:21:03 UTC

@Betacommand: 
There is no bug, the software works as intended. The software is just configured in a way that does not suit your current need. At the moment it is not possible to pass any user-configuration for specific PDFs/collections to the rendering software. Therefore it is not possible to include fair-use images for specific collections. 

@Greg:
I can't point you to any public discussion regarding that issue. There has been lot's of talk regarding image copyrights related to the Collection Extension, but I don't remember anything specific about fair use images.

@MzMcBride:
The code is not really the problem, it's the configuration which explicitly removes fair use images. 

All this happens in mwlib's licensechecker https://github.com/pediapress/mwlib/blob/master/mwlib/writer/licensechecker.py
The licensechecker supports three modes:
* nofilter (include all images, adding license info where available)
* blacklist (exclude all images marked as nonfree)
* whitelist (include only images that are marked as free, thus removing images with an unknown license)

The license information is imported from a csv file ( https://github.com/pediapress/mwlib/blob/master/mwlib/writer/wplicenses.csv ). This file contains the info about fair use images:
> "Fairuse",,"nonfree",,"- Copyrighted content that may be used as ''fair-use'', but since the commons does not accept ''fair use'' content this image will need to be deleted. [[Commons:Licensing#Material under the fair use clause is not allowed on the Commons|See here for details why]]."


The PDF writer is currently configured to use blacklisting for all wikipedia projects except for the german wikipedia. In the german wikipedia we are using whitelisting after a community uproar about including images with an unknown license in the PDFs.
( https://github.com/pediapress/mwlib.rl/blob/master/mwlib/rl/rlwriter.py#L190 )

So, that's pretty much all I can say about that topic.

Comment 12 Betacommand 2013-08-08 10:34:49 UTC

It should be trivial to create a method for setting nofilter mode upon command and per PDF generation. Thus allowing all images if a user wants them.

Comment 13 Volker Haas 2013-08-08 10:39:44 UTC

(In reply to comment #12)
> It should be trivial to create a method for setting nofilter mode upon
> command
> and per PDF generation. Thus allowing all images if a user wants them.

Unfortunately it is not trivial:
* The UI of the Collection Extension would need to be altered to allow custom settings
* The rendering software would need an interface to interpret these settings and apply them to the PDFs
* The render servers caching mechanism would need to be updated to avoid delivering PDFs with the wrong settings
* All this would need to be tested thoroughly

And there are probably more things that I forgot.

But of course patches are always welcome!

Comment 14 Andre Klapper 2013-08-08 11:17:24 UTC

In general I'd highly recommend to never start sentences with "It should be trivial" in bug reports if you don't know the codebase by heart. ;)

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links