Last modified: 2012-09-01 13:12:23 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T38950, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 36950 - Export and PDF generator returns "file not found" error on completion


Summary:	Export and PDF generator returns "file not found" error on completion

Status:	RESOLVED FIXED

Product:	MediaWiki extensions
Classification:	Unclassified
Component:	Collection (Other open bugs)
Version:	master
Hardware:	All All

Importance:	Normal normal with 2 votes (vote)
Target Milestone:	MW 1.20 version
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2012-05-18 08:15 UTC by Neil Babbage
Modified:	2012-09-01 13:12 UTC (History)
CC List:	9 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Neil Babbage 2012-05-18 08:15:54 UTC

Using the "download as PDF" option on en.wikibooks (create a PDF from a Collection) completes successfully. The URL that is then provided to link to the generated PDF returns "file not found" error. E.g.:

1) For this book: https://en.wikibooks.org/wiki/Fundamentals_of_Transportation
2) Request collection rendering: https://en.wikibooks.org/w/index.php?title=Special:Book&bookcmd=rendering&return_to=Wikibooks%3ACollections%2FFundamentals+of+Transportation&collection_id=7c6b44dc136f2d81&writer=rl
3) Generates this download link: https://en.wikibooks.org/w/index.php?title=Special:Book&bookcmd=download&collection_id=7c6b44dc136f2d81&writer=rl&return_to=Wikibooks%3ACollections%2FFundamentals+of+Transportation
4) Returns error "The file you are trying to download does not exist: Maybe it has been deleted and needs to be regenerated. "

Impacting at least two users on multiple books. Believed to have been working yesterday.

Comment 1 Martin Kraus 2012-05-18 08:41:00 UTC

It didn't work yesterday for me; but I think it worked 2 days ago.

Comment 2 Martin Kraus 2012-05-18 09:03:46 UTC

de.wikibooks.org is also affected.

Comment 3 Mark A. Hershberger 2012-05-18 14:27:40 UTC

just confirmed by requesting a pdf of front  page of  enwikib. rendered, got 404.

Comment 4 Mark A. Hershberger 2012-05-18 14:46:16 UTC

https://rt.wikimedia.org/Ticket/Display.html?id=2981

Comment 5 Thuvack 2012-05-18 16:25:04 UTC

Yes. This is the same at Wikiversity too.

Comment 6 Tomasz Finc 2012-05-18 17:40:14 UTC

I sent an email to the PediaPress guys to see what can be done about this.

Comment 7 Sam Reed (reedy) 2012-05-18 18:52:25 UTC

Ok, so, this isn't actually Collection specifically at fault. It's seemingly some change in MediaWiki

1.20wmf2 MediaWiki with 1.20wmf2 branched Collection works fine.

1.20wmf2 MediaWiki with head Collection works fine.

1.20wmf3 MediaWiki with head collection doesn't work.

1.20wmf3 MediaWiki with 1.20wmf3 branched collection doesn't work.

1.20wmf3 Mediawiki with 1.20wmf2 branched collection doesn't work.




In the meantime, I have put wikibooks back to 1.20wmf2

So the code is falling over at:

			$info = self::mwServeCommand( 'download', array(
				'collection_id' => $wgRequest->getVal( 'collection_id' ),
				'writer' => $wgRequest->getVal( 'writer' ),
			) );
			$content_type = $info['content_type'];
			$content_length = $info['download_content_length'];
			$content_disposition = null;
		}
		if ( !$info ) {
			$wgOut->showErrorPage( 'coll-download_notfound_title', 'coll-download_notfound_text' );
			return;
		}

Comment 8 Sam Reed (reedy) 2012-05-18 20:31:52 UTC

Ok, so the problem is this code recently added to our HTTP code:

		if ( isset( $this->respHeaders['content-length'] ) ) {
			if ( strlen( $this->content ) < $this->getResponseHeader( 'content-length' ) ) {
				$this->status->fatal( 'http-truncated-body' );
			}
		}


We're getting 0 content, but the header gives a number, so this is then classed as a fatal error..


url: http://pdf1.wikimedia.org:8080//08/084377b53fc2a31c/output.rl
2012-05-18 20:19:00 mw10 frwikibooks: content-length-header: 700197
content-length-actual: 0
url: http://pdf3.wikimedia.org:8080//f8/f84485f7ffd576cf/output.rl
2012-05-18 20:19:25 srv267 dewikibooks: content-length-header: 57811
content-length-actual: 0
url: http://pdf1.wikimedia.org:8080//ce/ce9d841ca7076699/output.rl


I've disabled that code for the moment, and now all wikis that were previously on 1.20wmf3 are back on it.


The question here, is why are we apparently getting no content. In this case, at least, it doesn't seem to make any difference, as we can still download the file fine.

Is it safe/sensible to add an option for the curl downloader that we just ignore this check? Collection seems to be the main (but not the only) offender

Comment 9 Aaron Schulz 2012-05-21 22:28:31 UTC

To many things might use Content-Length for purposes other than the body length. Like POSTing or PUTing to Swift/S3 will give a content-length of the data stored, not the response. Also, HEAD requests, of course, shouldn't have this check. Really, only the caller of the Http class can determine if the headers make sense for the body in this case.

Comment 10 Sam Reed (reedy) 2012-05-21 22:30:58 UTC

All fixed up now

Comment 11 Platonides 2012-05-22 14:09:31 UTC

(In reply to comment #9)
> To many things might use Content-Length for purposes other than the body
> length. 

You mean violating RFC 2616?

"   The Content-Length entity-header field indicates the size of the
   entity-body, in decimal number of OCTETs, sent to the recipient or,
   in the case of the HEAD method, the size of the entity-body that
   would have been sent had the request been a GET."

> Also, HEAD requests, of course, shouldn't have this check.
It seems HEAD requests weren't skipped from the check, which is a bug.

You could also argue not adding it for status codes 1xx, 204, and 304 (but they shouldn't have a non-zero Content-length. anyway)

> Like POSTing or PUTing to Swift/S3 will give a content-length of the
> data stored, not the response. 
That's the content-length of the client request, not of the server reply, which is what we're dealing with, here.
If the server replies the POST with the request size, that violates the HTTP specification. How do you know the response length, then?

> Really, only the caller of the Http class can determine if the
> headers make sense for the body in this case.
The caller shouldn't need to manually check the content.


Reedy, can you provide the server headers in the reply to the output.rl POST?

Comment 12 Aaron Schulz 2012-05-22 15:39:47 UTC

(In reply to comment #11)
> > Like POSTing or PUTing to Swift/S3 will give a content-length of the
> > data stored, not the response. 
> That's the content-length of the client request, not of the server reply, which
> is what we're dealing with, here.
> If the server replies the POST with the request size, that violates the HTTP
> specification. How do you know the response length, then?

No it's the response to the POST that has this header used like this. The client knows that it gets only headers back for such things (and Swift will use statuses like 204 on success).

Comment 13 Platonides 2012-05-23 19:57:50 UTC

(In reply to comment #12)
> No it's the response to the POST that has this header used like this. The
> client knows that it gets only headers back for such things (and Swift will use
> statuses like 204 on success).

Well, HTTP status 204 means "No content", so that would make it a slightly lesser violation.
Still, I see no good reason for doing it that way.

I don't see such behavior documented nor reflected in the doc samples, though:
http://docs.openstack.org/api/openstack-object-storage/1.0/content/create-update-object.html

Comment 14 Aaron Schulz 2012-05-23 20:36:43 UTC

(In reply to comment #13)
> (In reply to comment #12)
> > No it's the response to the POST that has this header used like this. The
> > client knows that it gets only headers back for such things (and Swift will use
> > statuses like 204 on success).
> 
> Well, HTTP status 204 means "No content", so that would make it a slightly
> lesser violation.
> Still, I see no good reason for doing it that way.
> 
> I don't see such behavior documented nor reflected in the doc samples, though:
> http://docs.openstack.org/api/openstack-object-storage/1.0/content/create-update-object.html

Right. I think I mixed this with something else.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links