Last modified: 2014-04-02 14:25:31 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T65371, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 63371 - Wikipedia Zero job for 2014-03-01 failed on Hadoop with "java.io.IOException: stored gzip size doesn't match decompressed size"


Summary:	Wikipedia Zero job for 2014-03-01 failed on Hadoop with "java.io.IOException:...

Status:	RESOLVED FIXED

Product:	Analytics
Classification:	Unclassified
Component:	General/Unknown (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Unprioritized normal
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2014-04-01 16:02 UTC by christian
Modified:	2014-04-02 14:25 UTC (History)
CC List:	3 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description christian 2014-04-01 16:02:18 UTC

The tail of the relevant log is 

-----8<-----Begin: log tail-----8<-----
2014-04-01 11:24:09,655 [communication thread] INFO  org.apache.hadoop.mapred.LocalJobRunner -
2014-04-01 11:24:12,656 [communication thread] INFO  org.apache.hadoop.mapred.LocalJobRunner -
2014-04-01 11:24:15,657 [communication thread] INFO  org.apache.hadoop.mapred.LocalJobRunner -
2014-04-01 11:24:18,658 [communication thread] INFO  org.apache.hadoop.mapred.LocalJobRunner -
2014-04-01 11:24:21,659 [communication thread] INFO  org.apache.hadoop.mapred.LocalJobRunner -
2014-04-01 11:24:24,660 [communication thread] INFO  org.apache.hadoop.mapred.LocalJobRunner -
2014-04-01 11:24:27,673 [communication thread] INFO  org.apache.hadoop.mapred.LocalJobRunner -
2014-04-01 11:24:27,730 [Thread-2] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
java.io.IOException: stored gzip size doesn't match decompressed size
        at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.executeTrailerState(BuiltInGzipDecompressor.java:389)
        at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.decompress(BuiltInGzipDecompressor.java:224)
        at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82)
        at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)
        at java.io.InputStream.read(InputStream.java:101)
        at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)
        at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:97)
        at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:239)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
        at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
2014-04-01 11:24:27,924 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 6017: Job failed! Error - NA
-----8<-----End: log tail-----8<-----


I'll investigate whether it's a random failure, or something broke.

Comment 1 Bingle 2014-04-01 16:05:22 UTC

Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/cards/1505

Comment 2 christian 2014-04-01 16:37:32 UTC

Rerunning the job gave the same result, so it's probably not some random failure.

Comment 3 christian 2014-04-01 17:23:06 UTC

Mhmm ... uncompressed zero files for today are for the first time
>2^32 bytes. Trimming each file below 2^32 bytes is making things
work again.

Our big data tooling cannot take more than 32-bit sized data?

And it's 1st April ... epic :-D

Comment 4 christian 2014-04-01 23:03:17 UTC

Upstream bug seems to be
  https://issues.apache.org/jira/browse/HADOOP-8900

That's included Hadoop 1.2.0, but the Pig snapshot version we used up
to now for Wikipedia Zero is Hadoop <1.2.0.

Rebuilding the current Pig head from sources also uses Hadoop <1.2.0.

Cloudera picks up the upstream bug with CDH 4.2.0. However, the CDH
4.2.0 pig jar from

  https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/pig/pig/0.10.0-cdh4.2.0/pig-0.10.0-cdh4.2.0.jar

does not include dependencies and fails with

  Exception in thread "main" java.lang.NoClassDefFoundError: jline/ConsoleReaderInputStream
          at java.lang.Class.getDeclaredMethods0(Native Method)
  [...]

.

Adding all dependencies by hand would be heavy lifting.

However, Cloudera's archive at

  http://archive-primary.cloudera.com/cdh4/cdh/4/pig-0.10.0-cdh4.2.0.tar.gz

holds the full sources after the build completed. So in that archive

  pig-0.10.0-cdh4.2.0.jar

is the jar with full dependencies that can be used to run pig in local
mode without having to extend the classpath by hand.

Using that jar, the carrier file could get generated again.

Doing some more tests tomorrow to make sure the switch in the used pig
version does not affect numbers.

Comment 5 christian 2014-04-02 08:38:59 UTC

I recomputed the data for a few days using the new pig.jar, and it matched
the data we received from the old jar.

Logs did not show any peculiarities with the new pig.jar.

Comment 6 Toby Negrin 2014-04-02 14:25:31 UTC

Thanks Christian -- nice work.

-Toby

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links