Last modified: 2014-11-19 22:14:55 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T69450, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 67450 - database consumer could batch inserts (sometimes)


Summary:	database consumer could batch inserts (sometimes)

Status:	PATCH_TO_REVIEW

Product:	Analytics
Classification:	Unclassified
Component:	EventLogging (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Lowest enhancement
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:	u=Analyst c=EventLogging p=34 s=2014-...
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2014-07-03 03:03 UTC by Sean Pringle
Modified:	2014-11-19 22:14 UTC (History)
CC List:	11 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Sean Pringle 2014-07-03 03:03:22 UTC

The eventlogging database consumer generates single-row inserts. From a performance perspective (master overhead and slave replication lag) it would be better to batch inserts.

The batches need not be large or of a particular size. The approach could simply be opportunistic grouping of small numbers of rows in the same table when more than one is available.

IRC excerpt:

<ori> one thing that would have to be rethought is table creation
<ori> right now eventlogging always just tries to insert events
<ori> if the database errors out because the table doesn't exist, *then* it 
      issues the create table statement
<ori> this is nice because you can drop or rename a table and it just gets 
      recreated anew without any downtime
<ori> i don't have a ready-made model of how this would work in a world where 
      we do batch inserts, but my reflexive hunch is that it's not a 
      show-stopping problem, and that we could work around it
<springle> not knowing the code at all, but could it be simple opportunistic 
           batching? when inserting a record, check if additional records for 
           the same table exist, and group them
<springle> might only get a few each time, but that would still be better
<ori> yeah, totally

Comment 1 nuria 2014-08-27 21:48:36 UTC

More detailed information of why is this item important when it comes to making EL data public is available in this e-mail thread:

https://lists.wikimedia.org/pipermail/analytics/2014-August/002434.html

Comment 2 Gerrit Notification Bot 2014-10-30 01:20:45 UTC

Change 169977 had a related patch set uploaded by Nuria:
[WIP] Batching event insertion

https://gerrit.wikimedia.org/r/169977

Comment 3 nuria 2014-10-31 17:39:38 UTC

Actual beginning of e-mail thread with pertinent conversation: https://lists.wikimedia.org/pipermail/analytics/2014-August/002429.html

Comment 4 Gerrit Notification Bot 2014-11-17 21:34:20 UTC

Change 169977 merged by jenkins-bot:
Batch event insertion

https://gerrit.wikimedia.org/r/169977

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links