Last modified: 2014-11-19 22:14:55 UTC
The eventlogging database consumer generates single-row inserts. From a performance perspective (master overhead and slave replication lag) it would be better to batch inserts. The batches need not be large or of a particular size. The approach could simply be opportunistic grouping of small numbers of rows in the same table when more than one is available. IRC excerpt: <ori> one thing that would have to be rethought is table creation <ori> right now eventlogging always just tries to insert events <ori> if the database errors out because the table doesn't exist, *then* it issues the create table statement <ori> this is nice because you can drop or rename a table and it just gets recreated anew without any downtime <ori> i don't have a ready-made model of how this would work in a world where we do batch inserts, but my reflexive hunch is that it's not a show-stopping problem, and that we could work around it <springle> not knowing the code at all, but could it be simple opportunistic batching? when inserting a record, check if additional records for the same table exist, and group them <springle> might only get a few each time, but that would still be better <ori> yeah, totally
More detailed information of why is this item important when it comes to making EL data public is available in this e-mail thread: https://lists.wikimedia.org/pipermail/analytics/2014-August/002434.html
Change 169977 had a related patch set uploaded by Nuria: [WIP] Batching event insertion https://gerrit.wikimedia.org/r/169977
Actual beginning of e-mail thread with pertinent conversation: https://lists.wikimedia.org/pipermail/analytics/2014-August/002429.html
Change 169977 merged by jenkins-bot: Batch event insertion https://gerrit.wikimedia.org/r/169977