Last modified: 2014-11-19 22:14:55 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T69450, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 67450 - database consumer could batch inserts (sometimes)
database consumer could batch inserts (sometimes)
Status: PATCH_TO_REVIEW
Product: Analytics
Classification: Unclassified
EventLogging (Other open bugs)
unspecified
All All
: Lowest enhancement
: ---
Assigned To: Nobody - You can work on this!
u=Analyst c=EventLogging p=34 s=2014-...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-07-03 03:03 UTC by Sean Pringle
Modified: 2014-11-19 22:14 UTC (History)
11 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Sean Pringle 2014-07-03 03:03:22 UTC
The eventlogging database consumer generates single-row inserts. From a performance perspective (master overhead and slave replication lag) it would be better to batch inserts.

The batches need not be large or of a particular size. The approach could simply be opportunistic grouping of small numbers of rows in the same table when more than one is available.

IRC excerpt:

<ori> one thing that would have to be rethought is table creation
<ori> right now eventlogging always just tries to insert events
<ori> if the database errors out because the table doesn't exist, *then* it 
      issues the create table statement
<ori> this is nice because you can drop or rename a table and it just gets 
      recreated anew without any downtime
<ori> i don't have a ready-made model of how this would work in a world where 
      we do batch inserts, but my reflexive hunch is that it's not a 
      show-stopping problem, and that we could work around it
<springle> not knowing the code at all, but could it be simple opportunistic 
           batching? when inserting a record, check if additional records for 
           the same table exist, and group them
<springle> might only get a few each time, but that would still be better
<ori> yeah, totally
Comment 1 nuria 2014-08-27 21:48:36 UTC
More detailed information of why is this item important when it comes to making EL data public is available in this e-mail thread:

https://lists.wikimedia.org/pipermail/analytics/2014-August/002434.html
Comment 2 Gerrit Notification Bot 2014-10-30 01:20:45 UTC
Change 169977 had a related patch set uploaded by Nuria:
[WIP] Batching event insertion

https://gerrit.wikimedia.org/r/169977
Comment 3 nuria 2014-10-31 17:39:38 UTC
Actual beginning of e-mail thread with pertinent conversation: https://lists.wikimedia.org/pipermail/analytics/2014-August/002429.html
Comment 4 Gerrit Notification Bot 2014-11-17 21:34:20 UTC
Change 169977 merged by jenkins-bot:
Batch event insertion

https://gerrit.wikimedia.org/r/169977

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links