Last modified: 2014-06-14 23:01:27 UTC
There seems to be an excessive number of 'hide' events on the firstedit tour. It's possible there's somehow double-counting.
This comes from examining the counts of event actions on the last tour step, for tour "firstedit". Like so: SELECT COUNT(*),event_action FROM GuidedTour_5222838 WHERE event_tourname = "firstedit" AND wiki = "enwiki" AND timestamp >= 20131009000000 AND event_step = 4 GROUP BY event_action; This produces the following results: 175 button-click 285 complete 442 hide 286 impression Some other steps in the tour also produce this discrepancy, for example the step 2 results: 173 button-click 345 hide 242 impression
We do have some users who have more hide events recorded than impressions. (Note that user IDs have been censored for privacy.) > SELECT event_userId, MIN(timestamp) AS first_event, SUM(event_action = "hide") as hides, SUM(event_action = "impression") as impressions FROM GuidedTour_5222838 WHERE timestamp > "20131009" AND wiki = "enwiki" GROUP BY event_userId HAVING SUM(event_action = "hide") > SUM(event_action = "impression") LIMIT 10; +--------------+----------------+-------+-------------+ | event_userId | first_event | hides | impressions | +--------------+----------------+-------+-------------+ | <snip> | 20131013014124 | 6 | 5 | | <snip> | 20131009001851 | 1 | 0 | | <snip> | 20131011033720 | 4 | 2 | | <snip> | 20131011163054 | 3 | 2 | | <snip> | 20131012163322 | 2 | 1 | | <snip> | 20131013060206 | 4 | 3 | | <snip> | 20131013151144 | 2 | 1 | | <snip> | 20131015082941 | 1 | 0 | | <snip> | 20131015120432 | 5 | 3 | | <snip> | 20131023143148 | 2 | 1 | +--------------+----------------+-------+-------------+ 10 rows in set (2.87 sec) I picked out a user with his first event was well after the "20131009" cutoff. > SELECT timestamp, event_action, event_tourName FROM GuidedTour_5222838 WHERE event_userId = <snip> AND timestamp >= "20131009"; +----------------+--------------+-----------------------------+ | timestamp | event_action | event_tourName | +----------------+--------------+-----------------------------+ | 20131013014124 | impression | gettingstartedtasktoolbarve | | 20131013014126 | hide | gettingstartedtasktoolbarve | | 20131013014131 | impression | gettingstartedtasktoolbarve | | 20131013014133 | hide | gettingstartedtasktoolbarve | | 20131013122419 | impression | gettingstartedtasktoolbarve | | 20131013122422 | hide | gettingstartedtasktoolbarve | <-- | 20131013122427 | hide | gettingstartedtasktoolbarve | <-- | 20131013122431 | impression | gettingstartedtasktoolbarve | | 20131013122433 | hide | gettingstartedtasktoolbarve | | 20131013122502 | impression | gettingstartedtasktoolbarve | | 20131013122507 | hide | gettingstartedtasktoolbarve | +----------------+--------------+-----------------------------+ 11 rows in set (1.05 sec) Note the two hide events occurring 5 seconds apart. I see this sort of pattern when I look through other users too. We'll often have an "impression" followed by one or more "hide"s that are separated by 5-10 seconds.
This doesn't explain the counts not matching, but I don't think https://git.wikimedia.org/blob/mediawiki%2fextensions%2fGuidedTour.git/HEAD/modules%2fext.guidedTour.lib.js#L230 should use guiders._lastCreatedGuiderID . I don't know why I didn't notice that before. I think it could lead to the wrong ID being used, especially with the preloading.
This is likely not relevant anymore, since we've switched to a new schema for this.