Last modified: 2014-04-21 18:56:06 UTC
We depend on the MaxMind database file to automatically updated so our GeoIP translations are accurate. Currently this is not the case. AKAIK there are at least 3 locations/uses of this file: 1. stat1002 (/usr/share/GeoIP is automatically updated by puppet) 2. HDFS (unknown location) 3. WikiStats (unknown location) We also need to be able to retain and use older versions of the file so as part of this defect, we'll need to come up with a scheme for retaining older versions of the db file. This was first exposed due to a jump in Pakistan IPs in the Wikipedia Zero dashboards.
Prioritization and scheduling of this bug is tracked on Mingle card https://mingle.corp.wikimedia.org/api/v2/projects/analytics//https://mingle.corp.wikimedia.org/api/v2/projects/analytics/cards/1132.xml
Prioritization and scheduling of this bug is tracked on Mingle card https://mingle.corp.wikimedia.org/api/v2/projects/analytics/1132 (previous comment was generated by bingle containing a small bug, that has now been fixed).
I think we should just have puppet keep the files historically and symlink to the current one. For HDFS, we can just sync the full /usr/share/GeoIP directory into HDFS, and load whichever file is needed.
Wikistats squid log processing is on stat1002. Only use of ip->geo is in '/usr/local/bin/geoiplogtag'
Prioritization and scheduling of this bug is tracked on Mingle card https://mingle.corp.wikimedia.org/projects/analytics/1132 (previous comment was generated by bingle containing a small bug, that has now been fixed). argh another bug :)