Last modified: 2014-11-14 15:47:48 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T75418, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 73418 - Most raw webrequest partitions for 2014-10-13T20/1H not marked successful
Most raw webrequest partitions for 2014-10-13T20/1H not marked successful
Status: RESOLVED FIXED
Product: Analytics
Classification: Unclassified
Refinery (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: 72300
  Show dependency treegraph
 
Reported: 2014-11-14 14:55 UTC by christian
Modified: 2014-11-14 15:47 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description christian 2014-11-14 14:55:27 UTC
Three of the webrequest partitions [1] for 2014-10-13T20/1H have been
been marked successful.

What happened?


[1]
_________________________________________________________________
qchris@stat1002 // jobs: 0 // time: 14:37:13 // exit code: 0
cwd: ~
~/cluster-scripts/dump_webrequest_status.sh 
  +------------------+--------+--------+--------+--------+
  | Date             |  bits  | mobile |  text  | upload |
  +------------------+--------+--------+--------+--------+
[...]
  | 2014-11-13T18/1H |    .   |    .   |    .   |    X   |
  | 2014-11-13T19/1H |    .   |    .   |    .   |    .   |
  | 2014-11-13T20/1H |    X   |    .   |    X   |    X   |
  | 2014-11-13T21/1H |    .   |    .   |    .   |    .   |
  | 2014-11-13T22/1H |    .   |    .   |    .   |    X   |
[...]
  +------------------+--------+--------+--------+--------+


Statuses:

  . --> Partition is ok
  M --> Partition manually marked ok
  X --> Partition is not ok (duplicates, missing, or nulls)
Comment 1 christian 2014-11-14 14:58:31 UTC
The three jobs for 2014-11-13T20/1H were in SUSPENDED state.
Some internal workflows got stuck with exception about RM issues [1].

This nicely matches yesterdays restarting of the resourcemanager after
upgrading the JVMs.
Resuming the 3 jobs did not work, so I killed and restarted them.




[1] JA009 JA009: org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1409078537822_77051' doesn't exist in RM.
Comment 2 christian 2014-11-14 15:47:48 UTC
Now the jobs succeeded, and the partitions got marked ok.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links