Last modified: 2014-11-07 19:14:03 UTC
A Central Auth cohort creates many rows, because with the current implementation, one MetricReport node is made for each project in a cohort, which is like 800 projects for most Central Auth cohorts. This has performance implications if we schedule these reports recurrently. We should take the necessary steps to clean up old data, maybe not create so many records to start with, add indices, etc.
There are several ways to go about this: #1. Purge from db anything older than 30 days that is not a recurrent reports. This can be done via a scheduler task #2 do not write to report table from nodes that are not the report node, those records are written now but we do not use them for anything.
We estimated #2, please have in mind recurrent reports need to be working as they are today.
Change 170703 had a related patch set uploaded by Mforns: Do not store reports that are not going to be used https://gerrit.wikimedia.org/r/170703
Change 170703 merged by Milimetric: Do not store reports that are not going to be used https://gerrit.wikimedia.org/r/170703