Last modified: 2012-04-16 09:16:04 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T29641, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 27641 - purgeThumbnails should support exclusion of expensive files
purgeThumbnails should support exclusion of expensive files
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
File management (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: 27699
  Show dependency treegraph
 
Reported: 2011-02-22 20:19 UTC by Michael Dale
Modified: 2012-04-16 09:16 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Exclude From Thumbnail Purge patch (939 bytes, patch)
2011-02-22 20:19 UTC, Michael Dale
Details
Exclude From Thumbnail Purge patch (1.45 KB, patch)
2011-03-05 01:50 UTC, Michael Dale
Details
patch to restore r84395,r98710. filter thumb purge in media handlers (3.56 KB, patch)
2011-11-28 08:02 UTC, Michael Dale
Details

Description Michael Dale 2011-02-22 20:19:47 UTC
Created attachment 8196 [details]
Exclude From Thumbnail Purge patch

With a transcoding extension you don't necessarily want to purge all the transcoded files every time someone visits an image page with ?action=purge, attached a quick implementation for excluding some file extension types with a configuration variable.
Comment 1 Derk-Jan Hartman 2011-02-22 22:00:58 UTC
quite so, but how WOULD you purge those files if needed ? We can't go asking for sys admin intervention everytime something goes wrong of course. That would be unmaintainable.
Comment 2 Michael Dale 2011-02-22 22:43:46 UTC
I would recommend a maintenance script. It would ideally only have to be run when you update the transcoding settings or when you update the encoders.
Comment 3 Derk-Jan Hartman 2011-02-24 21:02:10 UTC
ideally doens't exist in Wikipedia. Servers die unexpectedly. Encoders crash. Partly generated files that have to be regenerated will present themselves, it is unavoidable.
Comment 4 Michael Dale 2011-02-24 23:46:06 UTC
Error handling is important. If the encoder caches or the server dies unexpectedly the file will never be considered "encoded" since it only gets moved after a successful encode. Its certainly possible the encoder thinks the file was successfully encoded while human viewers disagree. But re-running the encode with the same settings every time someone requests ?action=purge will likely just result in the same broken file being regenerated over and over. 

It would be nice to let users register a given video has bad encode or has "playback issues", but that is a separate feature than the bug being discussed here. ( probably a phase 2 feature )

Purges should be managed via maintenance scripts so we it can be run in conjunction with new encoder deployments or deployments of changes in encoding parameters. 

Users are not completely dependent on these maintenance scripts they can "upload another version of the file" which will of result in purge of transcodes.  

If in practice we run into the situation where derivatives being purged and re-encoded with the same settings will 'make the files work', then yes I am open to bump the priority on letting some users purge transcodes somehow. 

*But the original bug is about differentiating from image purges* which I think we should do regardless.
Comment 5 Roan Kattouw 2011-03-04 10:40:39 UTC
(In reply to comment #0)
> Created attachment 8196 [details]
> Exclude From Thumbnail Purge patch
> 
$wgExcludeFromThumbnail must be defined and documented in DefaultSettings.php . Looks good to me otherwise, although I guess I'd like someone better-versed in our media handling code such as Bryan look over it, and of course Derk-Jan makes a good point about needing to be able to purge the transcoded versions somehow.
Comment 6 Bryan Tong Minh 2011-03-04 10:50:36 UTC
Why path info instead of substr?

Also, I'd rather have this based on mime type instead of extension. I can imagine for example that .ogg vorbis files are much less expensive to purge than .ogg theora files.
Comment 7 Michael Dale 2011-03-04 18:59:54 UTC
 (In reply to comment #6)
> Why path info instead of substr?
>

Its documented in PHP as a consist fast way to get a file extension without any regular expression or searching for the last period. Its coverage goes back to php4. Feel free to use substr with strrpos if you want but I think path info is less verbose.


> Also, I'd rather have this based on mime type instead of extension. I can
> imagine for example that .ogg vorbis files are much less expensive to purge
> than .ogg theora files.

First I don´t think ogg audio is very different from the rational mentioned above. ( ie we would want to explicit purge it ) In this version we don't even transcode audio, and with ogg audio files that /we/ gennerate /we/ would always use the .oga extension ( so there would be no ambiguity ) ... but assuming we want more explicit control than "extension based file types" over what we purge and do not purge ::

Mime type parsing would require some more complexity. If we just use the php mime_content_type call I believe its also file extension based per the systems magic.mime file. ( extensions are suposed to communicate mime type )

Alternatively we actually parse the file ( resource intensive since we do this in php for all our files and with ogg parsing I believe we do a linear read of the entire file to catch any concatenated streams which we would have to work around or just accept some computational delay anytime you want to purge video thumbnails )  

Or we have some lookup table per filename{transcodeKey}.ext....

Maybe the easiest change would be to rename this variable to $wgExcludePostfixKeyFromThumbnailPurge and then let it check anything after the base file name. So things like {myfile.ogg.AUDIO_64K.oga can be represented in the settings array as "AUDIO_64K.oga" ?

For version 2, its best to give every transcode its own db entry where we store the version of the encoder and all the settings used. This way we have fine grain indexed select purging capabilities for the extension maintenance scripts. And maybe we put “derivatives” in a sperate folder from “thumbnails” if we want to handle them very diffrently.
Comment 8 Bryan Tong Minh 2011-03-04 19:35:14 UTC
We already know the mime type: $file->getMimeType(). The point is that a single extension may map to multiple multiple mime types, but multiple extensions may also map to a single mime type. Mime types however are normalized within MediaWiki.
Comment 9 Bryan Tong Minh 2011-03-04 20:04:25 UTC
Oh right, $file is a file name and not a file object. It is of course possible to have thumbnails for a single file in different formats.

With your patch, do you wish to exclude certain types of source files, or certain types of thumbnails?
Comment 10 Michael Dale 2011-03-04 20:39:31 UTC
Right $file is for the parent source file.. some trickery would be needed to have it map its methods to thumbnail files. We only want to exclude certain types of "thumbnails" .. ie for a given source Ogg video file the purge request will purge the jpeg thumbnails but will not purge the transcoded webm derivative.  For other assets like svg and img, they won't be affected since we don't generate webm or ogv derivative files for them. 

The reason its oky to use the file extension, for the exclusion list is that we are only looking at files that we generate so we have a bit more control of the mapping of extension to file type.
Comment 11 Bawolff (Brian Wolff) 2011-03-04 21:02:45 UTC
If we do this, it'd be nice to have an extra parameter to say the api purge module to do a forced purge. This would stop people from accidently purging pages as you pretty much need to be a power user to use the api.
Comment 12 Michael Dale 2011-03-04 21:09:33 UTC
(In reply to comment #11)
> If we do this, it'd be nice to have an extra parameter to say the api purge
> module to do a forced purge. This would stop people from accidently purging
> pages as you pretty much need to be a power user to use the api.

Maybe require they have admin rights as well? Remember some derivatives like HD webm are not even close to real time encoded on moderately fast hardware, so they do take quite a bit of resources to re-encode.
Comment 13 Bryan Tong Minh 2011-03-04 22:16:59 UTC
(In reply to comment #10)
> The reason its oky to use the file extension, for the exclusion list is that we
> are only looking at files that we generate so we have a bit more control of the
> mapping of extension to file type.

Good one, I didn't realize originally that this patch concerned thumbnails instead of the original file.
Comment 14 Michael Dale 2011-03-05 01:50:48 UTC
Created attachment 8244 [details]
Exclude From Thumbnail Purge patch

Here is the patch with the lines that were missing from DefaultSettings.php
Comment 15 Roan Kattouw 2011-03-20 16:44:24 UTC
Patch applied in r84395.
Comment 16 Michael Dale 2011-11-28 08:02:37 UTC
Created attachment 9569 [details]
patch to restore r84395,r98710. filter thumb purge in media handlers

patch to restore r84395,r98710. filter thumb purge in media handlers

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links