Last modified: 2012-09-01 13:12:23 UTC
Using the "download as PDF" option on en.wikibooks (create a PDF from a Collection) completes successfully. The URL that is then provided to link to the generated PDF returns "file not found" error. E.g.: 1) For this book: https://en.wikibooks.org/wiki/Fundamentals_of_Transportation 2) Request collection rendering: https://en.wikibooks.org/w/index.php?title=Special:Book&bookcmd=rendering&return_to=Wikibooks%3ACollections%2FFundamentals+of+Transportation&collection_id=7c6b44dc136f2d81&writer=rl 3) Generates this download link: https://en.wikibooks.org/w/index.php?title=Special:Book&bookcmd=download&collection_id=7c6b44dc136f2d81&writer=rl&return_to=Wikibooks%3ACollections%2FFundamentals+of+Transportation 4) Returns error "The file you are trying to download does not exist: Maybe it has been deleted and needs to be regenerated. " Impacting at least two users on multiple books. Believed to have been working yesterday.
It didn't work yesterday for me; but I think it worked 2 days ago.
de.wikibooks.org is also affected.
just confirmed by requesting a pdf of front page of enwikib. rendered, got 404.
https://rt.wikimedia.org/Ticket/Display.html?id=2981
Yes. This is the same at Wikiversity too.
I sent an email to the PediaPress guys to see what can be done about this.
Ok, so, this isn't actually Collection specifically at fault. It's seemingly some change in MediaWiki 1.20wmf2 MediaWiki with 1.20wmf2 branched Collection works fine. 1.20wmf2 MediaWiki with head Collection works fine. 1.20wmf3 MediaWiki with head collection doesn't work. 1.20wmf3 MediaWiki with 1.20wmf3 branched collection doesn't work. 1.20wmf3 Mediawiki with 1.20wmf2 branched collection doesn't work. In the meantime, I have put wikibooks back to 1.20wmf2 So the code is falling over at: $info = self::mwServeCommand( 'download', array( 'collection_id' => $wgRequest->getVal( 'collection_id' ), 'writer' => $wgRequest->getVal( 'writer' ), ) ); $content_type = $info['content_type']; $content_length = $info['download_content_length']; $content_disposition = null; } if ( !$info ) { $wgOut->showErrorPage( 'coll-download_notfound_title', 'coll-download_notfound_text' ); return; }
Ok, so the problem is this code recently added to our HTTP code: if ( isset( $this->respHeaders['content-length'] ) ) { if ( strlen( $this->content ) < $this->getResponseHeader( 'content-length' ) ) { $this->status->fatal( 'http-truncated-body' ); } } We're getting 0 content, but the header gives a number, so this is then classed as a fatal error.. url: http://pdf1.wikimedia.org:8080//08/084377b53fc2a31c/output.rl 2012-05-18 20:19:00 mw10 frwikibooks: content-length-header: 700197 content-length-actual: 0 url: http://pdf3.wikimedia.org:8080//f8/f84485f7ffd576cf/output.rl 2012-05-18 20:19:25 srv267 dewikibooks: content-length-header: 57811 content-length-actual: 0 url: http://pdf1.wikimedia.org:8080//ce/ce9d841ca7076699/output.rl I've disabled that code for the moment, and now all wikis that were previously on 1.20wmf3 are back on it. The question here, is why are we apparently getting no content. In this case, at least, it doesn't seem to make any difference, as we can still download the file fine. Is it safe/sensible to add an option for the curl downloader that we just ignore this check? Collection seems to be the main (but not the only) offender
To many things might use Content-Length for purposes other than the body length. Like POSTing or PUTing to Swift/S3 will give a content-length of the data stored, not the response. Also, HEAD requests, of course, shouldn't have this check. Really, only the caller of the Http class can determine if the headers make sense for the body in this case.
All fixed up now
(In reply to comment #9) > To many things might use Content-Length for purposes other than the body > length. You mean violating RFC 2616? " The Content-Length entity-header field indicates the size of the entity-body, in decimal number of OCTETs, sent to the recipient or, in the case of the HEAD method, the size of the entity-body that would have been sent had the request been a GET." > Also, HEAD requests, of course, shouldn't have this check. It seems HEAD requests weren't skipped from the check, which is a bug. You could also argue not adding it for status codes 1xx, 204, and 304 (but they shouldn't have a non-zero Content-length. anyway) > Like POSTing or PUTing to Swift/S3 will give a content-length of the > data stored, not the response. That's the content-length of the client request, not of the server reply, which is what we're dealing with, here. If the server replies the POST with the request size, that violates the HTTP specification. How do you know the response length, then? > Really, only the caller of the Http class can determine if the > headers make sense for the body in this case. The caller shouldn't need to manually check the content. Reedy, can you provide the server headers in the reply to the output.rl POST?
(In reply to comment #11) > > Like POSTing or PUTing to Swift/S3 will give a content-length of the > > data stored, not the response. > That's the content-length of the client request, not of the server reply, which > is what we're dealing with, here. > If the server replies the POST with the request size, that violates the HTTP > specification. How do you know the response length, then? No it's the response to the POST that has this header used like this. The client knows that it gets only headers back for such things (and Swift will use statuses like 204 on success).
(In reply to comment #12) > No it's the response to the POST that has this header used like this. The > client knows that it gets only headers back for such things (and Swift will use > statuses like 204 on success). Well, HTTP status 204 means "No content", so that would make it a slightly lesser violation. Still, I see no good reason for doing it that way. I don't see such behavior documented nor reflected in the doc samples, though: http://docs.openstack.org/api/openstack-object-storage/1.0/content/create-update-object.html
(In reply to comment #13) > (In reply to comment #12) > > No it's the response to the POST that has this header used like this. The > > client knows that it gets only headers back for such things (and Swift will use > > statuses like 204 on success). > > Well, HTTP status 204 means "No content", so that would make it a slightly > lesser violation. > Still, I see no good reason for doing it that way. > > I don't see such behavior documented nor reflected in the doc samples, though: > http://docs.openstack.org/api/openstack-object-storage/1.0/content/create-update-object.html Right. I think I mixed this with something else.