Last modified: 2014-06-11 16:08:11 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T50660, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 48660 - Refactor blocks into separate metrics
Refactor blocks into separate metrics
Status: RESOLVED INVALID
Product: Analytics
Classification: Unclassified
Wikimetrics (Other open bugs)
unspecified
All All
: Normal normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-20 21:32 UTC by Dario Taraborelli
Modified: 2014-06-11 16:08 UTC (History)
11 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Dario Taraborelli 2013-05-20 21:32:45 UTC
The current implementation of the blocks metric is slightly inconsistent. In raw requests, the metric returns an array, but the default aggregator (proportion) operates on the expectation of a boolean.

I recommend we split blocks into two separate metrics:

1) is_blocked (limited to indefinite blocks, always returning a boolean or an undefined value, and implemented with the same parameters as the threshold metric, with a default t=24)

2) blocks (returning a block count and the associated metadata)

is_blocked should retain proportion as a default aggregator

Once this is done, we should also reconsider the best output format for blocks as it currently combines in the same array different types (int, timestamp) and I am not sure this is the most useful response. The new blocks metric should return a concise summary of an account's overall blocks history (including both temporary and indefinite blocks), the most appropriate aggregator will need to be defined accordingly.

See also BZ ticket #48341 for an issue affecting blocks metric aggregation.
Comment 1 Steven Walling 2013-05-20 21:37:14 UTC
(In reply to comment #0)
> The current implementation of the blocks metric is slightly inconsistent. In
> raw requests, the metric returns an array, but the default aggregator
> (proportion) operates on the expectation of a boolean.
> 
> I recommend we split blocks into two separate metrics:
> 
> 1) is_blocked (limited to indefinite blocks, always returning a boolean or an
> undefined value, and implemented with the same parameters as the threshold
> metric, with a default t=24)
> 
> 2) blocks (returning a block count and the associated metadata)
> 
> is_blocked should retain proportion as a default aggregator
> 
> Once this is done, we should also reconsider the best output format for
> blocks
> as it currently combines in the same array different types (int, timestamp)
> and
> I am not sure this is the most useful response. The new blocks metric should
> return a concise summary of an account's overall blocks history (including
> both
> temporary and indefinite blocks), the most appropriate aggregator will need
> to
> be defined accordingly.
> 
> See also BZ ticket #48341 for an issue affecting blocks metric aggregation.

This makes sense to me. Looking at indefinite blocks versus current blocks is valid because the purpose of the blocks metric is to tell us how many users in a given cohort were rejected by Wikipedia.
Comment 2 Oliver Keyes 2013-05-24 16:46:22 UTC
On the same note, I'm working on a series of regular expressions that can efficiently categorise blocks. At the moment it's largely accurate for indef blocks from the ipblocks table, and is divided into four categories:

-vandalism/other bad-faith actions
-Username problems
-Spam
-Sockpuppetry
-Things not covered by the other categories ("misc").

I'm going to spend some cycles at the hackathon refining them a bit further and running them against the block log to make sure they're compatible; I think the goal after that is to, at some point, work them into UserMetrics and provide a way of accurately bucketing blocked users, providing some slightly more granular data.
Comment 3 Steven Walling 2013-05-24 21:20:44 UTC
(In reply to comment #2)
> On the same note, I'm working on a series of regular expressions that can
> efficiently categorise blocks. At the moment it's largely accurate for indef
> blocks from the ipblocks table, and is divided into four categories:
> 
> -vandalism/other bad-faith actions
> -Username problems
> -Spam
> -Sockpuppetry
> -Things not covered by the other categories ("misc").
> 
> I'm going to spend some cycles at the hackathon refining them a bit further
> and
> running them against the block log to make sure they're compatible; I think
> the
> goal after that is to, at some point, work them into UserMetrics and provide
> a
> way of accurately bucketing blocked users, providing some slightly more
> granular data.

Adding type/reason would be a wonderful future enhancement. I know how difficult it must be to accurately parse the block log, but knowing the different types is of great use.
Comment 4 Oliver Keyes 2013-05-25 07:55:44 UTC
Actually it's pretty simple, he says after ~20 hours of work on the ipblocks table. logging WHERE log_type = 'block' will be more fun.
Comment 5 Andre Klapper 2014-05-29 17:23:08 UTC
[moving tickets as per bug 65903]
Comment 6 Dan Andreescu 2014-06-11 16:08:11 UTC
This bug has been made invalid through the transition to Wikimetrics.  Since User Metrics is no longer actively maintained, I will mark these old bugs as Invalid.

Duly noted that the discussion here is interesting and should inform future work in this area.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links