Last modified: 2014-06-25 07:30:22 UTC
Column for country data in EventLogging tables sometimes not only contain the country code, but also larger chunks of the clients cookies. Sometimes even the sessionId. The columns look for example like [1] GeoIP%3D%3A%3A%3A%3Avx; mediaWiki.user.sessionId=<SESSION_ID_REMOVED>; GeoIP= or US%3A<CITY_REMOVED>%3A<LAT_REMOVED>%3A<LON_REMOVED>%3Av4; ve-beta-welcome-dialog=1; centralnotice_bucket=0-4.2; GeoIP=CH (replaced potentially sensitive data by <..._REMOVED>). Initial report is at https://lists.wikimedia.org/mailman/private/analytics-internal/2014-June/001540.html At least NavigationTiming_7494934 NavigationTiming_8365252 MultimediaViewerNetworkPerformance_7917896 tables are affected, likely more tables. I'll run tests against all tables containing 'country' in their column names. [1] To see unredacted examples, run for example SELECT event_originCountry FROM log.NavigationTiming_8365252 WHERE LENGTH(event_originCountry) > 2 LIMIT 20; or SELECT event_originCountry FROM log.NavigationTiming_8365252 WHERE event_originCountry LIKE '%session%' LIMIT 20; against dbstore1002.
Change 138748 had a related patch set uploaded by QChris: Avoid encoding issues by fetching GeoIP cookie through jquery.cookie https://gerrit.wikimedia.org/r/138748
Change 138748 merged by Mwalker: Avoid encoding issues by fetching GeoIP cookie through jquery.cookie https://gerrit.wikimedia.org/r/138748
Change 139353 had a related patch set uploaded by QChris: Ignore country values that are not two characters long https://gerrit.wikimedia.org/r/139353
Change 139357 had a related patch set uploaded by QChris: Reset GeoIP cookie upon encountering invalid country code https://gerrit.wikimedia.org/r/139357
Change 139357 merged by jenkins-bot: Reset GeoIP cookie upon encountering invalid country code https://gerrit.wikimedia.org/r/139357
Change 139353 merged by Nuria: Ignore country values that are not two characters long https://gerrit.wikimedia.org/r/139353
Change 140023 had a related patch set uploaded by QChris: Fixup country column names in post_validation_fixups https://gerrit.wikimedia.org/r/140023
Change 140023 merged by jenkins-bot: Fixup country column names in post_validation_fixups https://gerrit.wikimedia.org/r/140023
Change 140061 had a related patch set uploaded by QChris: Fix revision check for MultimediaViewerDuration in post validation fixup https://gerrit.wikimedia.org/r/140061
Affected columns (currently) are MultimediaViewerDuration_8318615.event_country MultimediaViewerDuration_8572641.event_country MultimediaViewerNetworkPerformance_7917896_1.event_country MultimediaViewerNetworkPerformance_7917896.event_country NavigationTiming_7494934.event_originCountry NavigationTiming_8365252.event_originCountry Of those, only MultimediaViewerDuration_8572641.event_country is still getting affected rows. Once that is solved, I'll start cleaning up the tables.
Change 140061 merged by jenkins-bot: Fix revision check for MultimediaViewerDuration in post validation fixup https://gerrit.wikimedia.org/r/140061
Since last Wednesday, Ops (RT: 7708) are running the cleanup scripts. NavigationTiming_7494934 is cleaned up. Thanks Sean! For the other 5 tables, Ops currently paused the script due to some unrelated outages on the databases. But the scripts will resume soonish.
I'll mark this resolved from our point of view. Once Ops finishes running the scripts, we just have to notify people the fix is complete.
The tables MultimediaViewerDuration_8318615 MultimediaViewerDuration_8572641 MultimediaViewerNetworkPerformance_7917896 NavigationTiming_7494934 NavigationTiming_8365252 have been cleaned up. Thanks Sean! MultimediaViewerNetworkPerformance_7917896_1 is still missing cleanup, but due to analytics thread at http://lists.wikimedia.org/pipermail/analytics/2014-June/002233.html we'll drop the table altogether.
Meanwhile MultimediaViewerNetworkPerformance_7917896_1 has been dropped (thanks Andrew and Sean) for bug 66649, so all affected database tables have either been scrubbed clean or dropped.