Last modified: 2014-10-21 10:54:41 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T73948, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 71948 - Raw webrequest partition for 'upload' for 2014-10-10T15:xx:xx not marked successful
Raw webrequest partition for 'upload' for 2014-10-10T15:xx:xx not marked succ...
Status: RESOLVED FIXED
Product: Analytics
Classification: Unclassified
Refinery (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: 72299
  Show dependency treegraph
 
Reported: 2014-10-11 11:19 UTC by christian
Modified: 2014-10-21 10:54 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description christian 2014-10-11 11:19:45 UTC
For the hour 2014-10-10T15:xx:xx, the upload partition [1] was marked
successful.

What happened?




[1]
_________________________________________________________________
qchris@stat1002 // jobs: 0 // time: 10:42:42 // exit code: 0
cwd: ~
cluster-scripts/dump_webrequest_status.sh 
  +---------------------+--------+--------+--------+--------+
  | Date                |  bits  |  text  | mobile | upload |
  +---------------------+--------+--------+--------+--------+
[...]
  | 2014-10-10T13:xx:xx |    .   |    .   |    .   |    .   |    
  | 2014-10-10T14:xx:xx |    .   |    .   |    .   |    .   |    
  | 2014-10-10T15:xx:xx |    .   |    .   |    .   |    X   |    
  | 2014-10-10T16:xx:xx |    .   |    .   |    .   |    .   |    
  | 2014-10-10T17:xx:xx |    .   |    .   |    .   |    .   |    
[...]
  +---------------------+--------+--------+--------+--------+


Statuses:

  . --> Partition is ok
  X --> Partition is not ok (duplicates, missing, or nulls)
Comment 1 christian 2014-10-11 11:21:38 UTC
The Oozie job for checking that partition has status KILLED [1], and
seems to have been killed by user hdfs at 17:28 [2].
A few minutes later, bundles have been restarted, so I assume the
killing of the partition checking happend deliberately.

However, since the job's sequence statistics have not been fully
computed (Killed at 95% of reduce step), I started the recomputation
job by hand.

Sequence stats recomputation is done, and the partition has neither
missing nor duplicates.

Hence, I manually marked the partition good.


[1]

qchris@analytics1027:~$ oozie job -verbose -info 0037425-140725140105408-oozie-oozi-W
Job ID : 0037425-140725140105408-oozie-oozi-W
------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : hive_add_partition-wmf_raw.webrequest-upload,2014,10,10,15-wf
App Path      : hdfs://analytics-hadoop/wmf/refinery/current/oozie/webrequest/partition/add/workflow.xml
Status        : KILLED
Run           : 0
User          : hdfs
Group         : -
Created       : 2014-10-10 17:04:54 GMT
Started       : 2014-10-10 17:04:54 GMT
Last Modified : 2014-10-10 17:28:15 GMT
Ended         : 2014-10-10 17:28:13 GMT
CoordAction ID: 0003812-140725140105408-oozie-oozi-C@2060

Actions
------------------------------------------------------------------------------------------------------------------------------------
ID      Console URL     Error Code      Error Message   External ID     External Status Name    Retries Tracker URI     Type    Started Status  Ended
------------------------------------------------------------------------------------------------------------------------------------
0037425-140725140105408-oozie-oozi-W@:start:    -       -       -       -       OK      :start: 0       -       :START: 2014-10-10 17:04:54 GMT OK      2014-10-10 17:04:54 GMT
------------------------------------------------------------------------------------------------------------------------------------
0037425-140725140105408-oozie-oozi-W@add_partition      http://analytics1027.eqiad.wmnet:11000/oozie?job=0037426-140725140105408-oozie-oozi-W   -       -     0037426-140725140105408-oozie-oozi-W     SUCCEEDED       add_partition   0       local   sub-workflow    2014-10-10 17:04:54 GMT OK      2014-10-10 17:05:11 GMT
------------------------------------------------------------------------------------------------------------------------------------
0037425-140725140105408-oozie-oozi-W@generate_sequence_statistics       http://analytics1010.eqiad.wmnet:8088/proxy/application_1409078537822_38526/    -     -job_1409078537822_38526 KILLED  generate_sequence_statistics    0       resourcemanager.analytics.eqiad.wmnet:8032      hive    2014-10-10 17:05:11 GMT KILLED2014-10-10 17:28:15 GMT
------------------------------------------------------------------------------------------------------------------------------------



[2] See HDFS's /var/log/hadoop-yarn/apps/hdfs/logs/application_1409078537822_38526/analytics1029.eqiad.wmnet_8041 line 607:
:2014-10-10 17:28:13,907 INFO [IPC Server handler 0 on 36062] org.apache.hadoop.mapreduce.v2.app.client.MRClientService: Kill job job_1409078537822_38526 received from hdfs (auth:SIMPLE) at 10.64.36.127

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links