Last modified: 2012-04-18 15:37:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T37590, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 35590 - Disparity between submit clicktracking events and AFTv5 records created
Disparity between submit clicktracking events and AFTv5 records created
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
ArticleFeedbackv5 (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Reha Sterbin
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-29 16:39 UTC by Reha Sterbin
Modified: 2012-04-18 15:37 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Reha Sterbin 2012-03-29 16:39:42 UTC
From Dario, 3/28/2012:


The first feedback post submitted on enwiki via Option E  (af_link_id = 5)  was stored at 20120328190501. The current breakdown of posts by af_link_id submitted after that cut off time is the following:

mysql> SELECT af_link_id, COUNT(af_id) FROM aft_article_feedback WHERE af_created > 20120328190501 GROUP BY af_link_id ; 
+------------+--------------+
| af_link_id | COUNT(af_id) |
+------------+--------------+
|          0 |          105 |
|          1 |            8 |
|          5 |           99 |
+------------+--------------+

The first impression is that nobody is seeing Option A (which is what we all predicted). If the bucketing is working fine (see above) we should also have no major inconsistency between these records and the submit event counts:

dartar@emery:/var/log/aft$ grep -A1000000 20120328190501 clicktracking.log | grep option1A-submit-overlay | wc -l
10
dartar@emery:/var/log/aft$ grep -A1000000 20120328190501 clicktracking.log | grep option1E-submit-overlay | wc -l
123
dartar@emery:/var/log/aft$ grep -A1000000 20120328190501 clicktracking.log | grep option1X-submit-overlay | wc -l
0
dartar@emery:/var/log/aft$ grep -A1000000 20120328190501 clicktracking.log | grep option1A-submit-bottom | wc -l
48
dartar@emery:/var/log/aft$ grep -A1000000 20120328190501 clicktracking.log | grep option1E-submit-bottom | wc -l
38
dartar@emery:/var/log/aft$ grep -A1000000 20120328190501 clicktracking.log | grep option1X-submit-bottom | wc -l
38

Now the way I interpret this data is that (if the database is storing the link associated with the widget, not the bucket - Reha please correct me if I am wrong) there are several clicks (about 20%) that are logged but do not produce any record:

Bottom widget:  124 (option1*-submit-bottom)
Option A:  10 (option1A-submit-overlay)
Option E:  123 (option1E-submit-overlay)

Can you think of any scenario in which this would happen? Are these events captures for clicks on the button while it's still disabled (i.e. before the user fills in some text)?
Comment 1 Reha Sterbin 2012-03-29 16:40:39 UTC
The disparity between submit clicks and new feedback rows could be caused by one of three things:

1) Trying to submit entirely empty feedback -- the event fires even if the button has been disabled.
2) Stale logins -- if you leave open a page long enough for your login to expire, your feedback will be rejected because your login information is bogus.
3) Some other, unknown, backend error

The first two are legitimate reasons for the numbers not to sync up, but I'm not certain it's enough to account for the 20% disparity.  I think we should consider doing something to track failed attempts, so we can determine whether this is a known issue or a new one.

I'd propose the following:

1) Do not fire a submit-button-click event if the button is disabled
2) Fire a (new) submit-failed event if the submit fails for any reason.  I could put any error message provided in with the extra data.
3) Ask Roan to check the error logs for any sign of AFT5.
Comment 2 Dario Taraborelli 2012-04-06 18:06:49 UTC
1) above should be easy to test on a local instance. I imagine you are not
referring to comment-less posts (which would still show up in the DB), but
attempts to submit when the button is still disabled (and no text and no form
option has been selected) right? I realize that the event name is ambiguous (it
could either refer to a successfully posted form or to a click on a form
element.

I agree on the proposal and having submit fails logged in the error log sounds
good to me. Would that also capture abuse filter blocks? Or do we want to
capture these events separately in clicktracking.log?
Comment 3 Reha Sterbin 2012-04-10 16:16:31 UTC
1) That's correct.  I've made the change, but it hasn't yet been submitted for inclusion in the next release.

2) We can certainly inject these into the error log, but because only Roan would have access to that data, we'd be putting yet another responsibility on his plate.  Since you're the person most in need of access to this information, I think we should put it in the clicktracking log instead.  I'd recommend adding tracking ids like this:

option{EXPERIMENT}-submit_error_{REASON}-{LOCATION}

Here are the ways for a submit to fail, and the names I'd use for each:

- AbuseFilter rejection (afreject)
- AbuseFilter warning (afwarn)
- Blocked user (blocked)
- Missing user (missinguser)
- AFT not enabled for the page requested (invalidpage)
- Page ID not found (invalidpageid)
- Stale login or bad anon token (invalidtoken)
- Not a post request (mustbeposted)
- Anything unexpected, like missing parameter or a database error (unknown)

Obviously some of these are very unlikely (like mustbeposted), but if they're happening, we should know about them.

I could also add an event for submit success, to be fired if the submit returned successfully:

option{EXPERIMENT}-submit_success-{LOCATION}

That way you could test that the success count matched the number of rows in the log, and that the click count matched the number of errors plus the number of successes.

Let me know if which of these you'd like me to implement.
Comment 4 Dario Taraborelli 2012-04-11 21:01:28 UTC
The plan makes sense to me. Since all these events are tied to submits we won't be flooding the logs with lots of unnecessary warnings. We'll be basically doubling the volume of submit events, which sounds reasonable to me:

1) submit_attempt (what we currently call "submit" events)
2) submit_error OR submit_success

We should also add a config setting to be able to switch off all submit error logging. Eventually,  clicktracking (or whatever will replace it) should have its own error logging functionality so there's no need to hardcode these tracking codes in the calling extension.
Comment 5 Reha Sterbin 2012-04-12 00:06:55 UTC
I've made these changes and submitted them to Gerrit for review:

https://gerrit.wikimedia.org/r/4766
https://gerrit.wikimedia.org/r/4767

I've tested locally, and prototype has been updated and is ready for testing.
Comment 6 Reha Sterbin 2012-04-18 15:37:02 UTC
Fixed in https://gerrit.wikimedia.org/r/5170

Available on prototype.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links