Last modified: 2014-10-14 18:57:07 UTC
https://github.com/halfak/mediawiki-utilities
Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/cards/1502
Given that mediawiki utilities uses plain SQL and we are set to use alembic I do not see how this 'integration' could happen. Are we sure we want to keep this bug open?
wikimetrics is using sqlalchemy, and that's a bit of a mismatch with mediawiki utilities. I don't think that's too big of a deal, we could integrate the tools if it's a good idea. But that depends on which way wikimetrics as a product goes, and how we structure our data pipeline. One possibility is to have wikimetrics become the ETL tool for public data. It could restructure our OLTP + recent changes + event streams into a more traditional, easy to work with, data warehouse. In that case, the logic from mediawiki-utilities would be very useful. We may wish to convert some of it to sqlalchemy, but that's a minor point. Another possibility is to have a separate ETL process, based on an existing tool or a combination of tools. Wikimetrics would then be re-fashioned to query on top of the resulting data warehouse. In that case, mediawiki-utilities could be used to inform the ETL process but it would have a very different purpose from Wikimetrics. I'm not opinionated on which way we go, but I think we should keep this bug open as a reminder of the great logic encapsulated in mediawiki-utilities.