Last modified: 2014-07-25 13:41:14 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T69128, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 67128 - Story: Admin has duplicate monitoring in Icinga


Summary:	Story: Admin has duplicate monitoring in Icinga

Status:	RESOLVED FIXED

Product:	Analytics
Classification:	Unclassified
Component:	Refinery (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	High enhancement
Target Milestone:	---
Assigned To:	christian

URL:
Whiteboard:	u=AnalyticsEng c=Refinery p=34 s=2014...
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2014-06-26 13:04 UTC by Kevin Leduc
Modified:	2014-07-25 13:41 UTC (History)
CC List:	7 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Kevin Leduc 2014-06-26 13:04:46 UTC

scheduled via Oozie

Comment 1 Toby Negrin 2014-06-26 18:18:33 UTC

The duplicate monitoring describes detecting duplicate events in the Kafka logs.

Comment 2 Andrew Otto 2014-07-01 14:20:00 UTC

The hive query for this is already written.  The table will need to be manually created, and then this Hive query needs to be regularly scheduled by Oozie.

Hive query:
https://github.com/wikimedia/analytics-refinery/blob/master/hive/webrequest/sequence_stats.hql

The Oozie layout has been refactored by Christian and I, but it remains in kraken.  The directory structure needs to be moved over from there.  Go ahead bring over the kraken oozie directory into analytics/refinery:

https://github.com/wikimedia/kraken/tree/master/oozie

You should omit the 'archive' directory; don't bring that to refinery.

You can then add your oozie configs in something like oozie/webrequest/sequence_stats (or something, not sure this is the best layout).

Comment 3 Gerrit Notification Bot 2014-07-01 16:07:19 UTC

Change 143336 had a related patch set uploaded by Milimetric:
Migrate oozie folder from Kraken minus archive

https://gerrit.wikimedia.org/r/143336

Comment 4 Gerrit Notification Bot 2014-07-01 16:15:17 UTC

Change 143336 merged by Ottomata:
Migrate oozie folder from Kraken minus archive

https://gerrit.wikimedia.org/r/143336

Comment 5 Gerrit Notification Bot 2014-07-01 21:36:30 UTC

Change 143486 had a related patch set uploaded by Milimetric:
[WIP] Oozify sequence_stats hive script

https://gerrit.wikimedia.org/r/143486

Comment 6 Kevin Leduc 2014-07-08 23:55:38 UTC

Moving story to next sprint since it has not been completed this Sprint.

Comment 7 Gerrit Notification Bot 2014-07-09 07:54:19 UTC

Change 144909 had a related patch set uploaded by QChris:
Drop unneeded parts of oozie import

https://gerrit.wikimedia.org/r/144909

Comment 8 Gerrit Notification Bot 2014-07-10 13:05:25 UTC

Change 144909 merged by Ottomata:
Drop unneeded partition dropping part of oozie import

https://gerrit.wikimedia.org/r/144909

Comment 9 christian 2014-07-15 20:56:21 UTC

Discussing the solution to this item a bit more with ottomata last
week, it turned out that it might be better to incorporate the
duplicate checking into partition adding, and turning the aggregated
statistics into a means to set a “done” flag for data sets that do not
suffer obvious holes/duplicates.

That would help the general pipeline, as it allows to trigger further
parts of the pipeline based on the done flag and no longer encoding
the same timing heuristic again and again into pipelines.

However, partition adding is currently not working (as it is still
centered around the precursor of refinery). So we need to fix
partition adding before. But that's needed anyways to get webrequest
ingestion working in refinery, so it's not a wasted effort.

So the new requirements are:
  * Fixing the partition adding jobs
  * Integrating the duplicate monitoring there
  * Tag data sets as done (dependent on the outcome of the statistic
    computations)

With those changed requirements, this bug has been reestimated.

Comment 10 Gerrit Notification Bot 2014-07-23 12:48:18 UTC

Change 148650 had a related patch set uploaded by QChris:
Add pipeline for basic verification of webrequest logs

https://gerrit.wikimedia.org/r/148650

Comment 11 Gerrit Notification Bot 2014-07-23 20:47:45 UTC

Change 148650 merged by Ottomata:
Add pipeline for basic verification of webrequest logs

https://gerrit.wikimedia.org/r/148650

Comment 12 Gerrit Notification Bot 2014-07-25 13:41:14 UTC

Change 143486 abandoned by QChris:
Coordinate computing sequence statistics through Oozie

Reason:
Different approach was implemented at

  Ie34f09a671a2ce341daabd8822d27e6b993d2e3e

and got merged meanwhile.

All comments in this change have been addressed, or been
carried over to be tracked in bugzilla.

https://gerrit.wikimedia.org/r/143486

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links