Last modified: 2012-09-01 13:12:23 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T38950, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 36950 - Export and PDF generator returns "file not found" error on completion
Export and PDF generator returns "file not found" error on completion
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
Collection (Other open bugs)
master
All All
: Normal normal with 2 votes (vote)
: MW 1.20 version
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-18 08:15 UTC by Neil Babbage
Modified: 2012-09-01 13:12 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Neil Babbage 2012-05-18 08:15:54 UTC
Using the "download as PDF" option on en.wikibooks (create a PDF from a Collection) completes successfully. The URL that is then provided to link to the generated PDF returns "file not found" error. E.g.:

1) For this book: https://en.wikibooks.org/wiki/Fundamentals_of_Transportation
2) Request collection rendering: https://en.wikibooks.org/w/index.php?title=Special:Book&bookcmd=rendering&return_to=Wikibooks%3ACollections%2FFundamentals+of+Transportation&collection_id=7c6b44dc136f2d81&writer=rl
3) Generates this download link: https://en.wikibooks.org/w/index.php?title=Special:Book&bookcmd=download&collection_id=7c6b44dc136f2d81&writer=rl&return_to=Wikibooks%3ACollections%2FFundamentals+of+Transportation
4) Returns error "The file you are trying to download does not exist: Maybe it has been deleted and needs to be regenerated. "

Impacting at least two users on multiple books. Believed to have been working yesterday.
Comment 1 Martin Kraus 2012-05-18 08:41:00 UTC
It didn't work yesterday for me; but I think it worked 2 days ago.
Comment 2 Martin Kraus 2012-05-18 09:03:46 UTC
de.wikibooks.org is also affected.
Comment 3 Mark A. Hershberger 2012-05-18 14:27:40 UTC
just confirmed by requesting a pdf of front  page of  enwikib. rendered, got 404.
Comment 4 Mark A. Hershberger 2012-05-18 14:46:16 UTC
https://rt.wikimedia.org/Ticket/Display.html?id=2981
Comment 5 Thuvack 2012-05-18 16:25:04 UTC
Yes. This is the same at Wikiversity too.
Comment 6 Tomasz Finc 2012-05-18 17:40:14 UTC
I sent an email to the PediaPress guys to see what can be done about this.
Comment 7 Sam Reed (reedy) 2012-05-18 18:52:25 UTC
Ok, so, this isn't actually Collection specifically at fault. It's seemingly some change in MediaWiki

1.20wmf2 MediaWiki with 1.20wmf2 branched Collection works fine.

1.20wmf2 MediaWiki with head Collection works fine.

1.20wmf3 MediaWiki with head collection doesn't work.

1.20wmf3 MediaWiki with 1.20wmf3 branched collection doesn't work.

1.20wmf3 Mediawiki with 1.20wmf2 branched collection doesn't work.




In the meantime, I have put wikibooks back to 1.20wmf2

So the code is falling over at:

			$info = self::mwServeCommand( 'download', array(
				'collection_id' => $wgRequest->getVal( 'collection_id' ),
				'writer' => $wgRequest->getVal( 'writer' ),
			) );
			$content_type = $info['content_type'];
			$content_length = $info['download_content_length'];
			$content_disposition = null;
		}
		if ( !$info ) {
			$wgOut->showErrorPage( 'coll-download_notfound_title', 'coll-download_notfound_text' );
			return;
		}
Comment 8 Sam Reed (reedy) 2012-05-18 20:31:52 UTC
Ok, so the problem is this code recently added to our HTTP code:

		if ( isset( $this->respHeaders['content-length'] ) ) {
			if ( strlen( $this->content ) < $this->getResponseHeader( 'content-length' ) ) {
				$this->status->fatal( 'http-truncated-body' );
			}
		}


We're getting 0 content, but the header gives a number, so this is then classed as a fatal error..


url: http://pdf1.wikimedia.org:8080//08/084377b53fc2a31c/output.rl
2012-05-18 20:19:00 mw10 frwikibooks: content-length-header: 700197
content-length-actual: 0
url: http://pdf3.wikimedia.org:8080//f8/f84485f7ffd576cf/output.rl
2012-05-18 20:19:25 srv267 dewikibooks: content-length-header: 57811
content-length-actual: 0
url: http://pdf1.wikimedia.org:8080//ce/ce9d841ca7076699/output.rl


I've disabled that code for the moment, and now all wikis that were previously on 1.20wmf3 are back on it.


The question here, is why are we apparently getting no content. In this case, at least, it doesn't seem to make any difference, as we can still download the file fine.

Is it safe/sensible to add an option for the curl downloader that we just ignore this check? Collection seems to be the main (but not the only) offender
Comment 9 Aaron Schulz 2012-05-21 22:28:31 UTC
To many things might use Content-Length for purposes other than the body length. Like POSTing or PUTing to Swift/S3 will give a content-length of the data stored, not the response. Also, HEAD requests, of course, shouldn't have this check. Really, only the caller of the Http class can determine if the headers make sense for the body in this case.
Comment 10 Sam Reed (reedy) 2012-05-21 22:30:58 UTC
All fixed up now
Comment 11 Platonides 2012-05-22 14:09:31 UTC
(In reply to comment #9)
> To many things might use Content-Length for purposes other than the body
> length. 

You mean violating RFC 2616?

"   The Content-Length entity-header field indicates the size of the
   entity-body, in decimal number of OCTETs, sent to the recipient or,
   in the case of the HEAD method, the size of the entity-body that
   would have been sent had the request been a GET."

> Also, HEAD requests, of course, shouldn't have this check.
It seems HEAD requests weren't skipped from the check, which is a bug.

You could also argue not adding it for status codes 1xx, 204, and 304 (but they shouldn't have a non-zero Content-length. anyway)

> Like POSTing or PUTing to Swift/S3 will give a content-length of the
> data stored, not the response. 
That's the content-length of the client request, not of the server reply, which is what we're dealing with, here.
If the server replies the POST with the request size, that violates the HTTP specification. How do you know the response length, then?

> Really, only the caller of the Http class can determine if the
> headers make sense for the body in this case.
The caller shouldn't need to manually check the content.


Reedy, can you provide the server headers in the reply to the output.rl POST?
Comment 12 Aaron Schulz 2012-05-22 15:39:47 UTC
(In reply to comment #11)
> > Like POSTing or PUTing to Swift/S3 will give a content-length of the
> > data stored, not the response. 
> That's the content-length of the client request, not of the server reply, which
> is what we're dealing with, here.
> If the server replies the POST with the request size, that violates the HTTP
> specification. How do you know the response length, then?

No it's the response to the POST that has this header used like this. The client knows that it gets only headers back for such things (and Swift will use statuses like 204 on success).
Comment 13 Platonides 2012-05-23 19:57:50 UTC
(In reply to comment #12)
> No it's the response to the POST that has this header used like this. The
> client knows that it gets only headers back for such things (and Swift will use
> statuses like 204 on success).

Well, HTTP status 204 means "No content", so that would make it a slightly lesser violation.
Still, I see no good reason for doing it that way.

I don't see such behavior documented nor reflected in the doc samples, though:
http://docs.openstack.org/api/openstack-object-storage/1.0/content/create-update-object.html
Comment 14 Aaron Schulz 2012-05-23 20:36:43 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > No it's the response to the POST that has this header used like this. The
> > client knows that it gets only headers back for such things (and Swift will use
> > statuses like 204 on success).
> 
> Well, HTTP status 204 means "No content", so that would make it a slightly
> lesser violation.
> Still, I see no good reason for doing it that way.
> 
> I don't see such behavior documented nor reflected in the doc samples, though:
> http://docs.openstack.org/api/openstack-object-storage/1.0/content/create-update-object.html

Right. I think I mixed this with something else.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links