Last modified: 2014-08-28 15:27:54 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T68786, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 66786 - Rename revision_userindex to revision
Rename revision_userindex to revision
Status: RESOLVED WONTFIX
Product: Wikimedia Labs
Classification: Unclassified
tools (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Marc A. Pelletier
:
Depends on:
Blocks: labs-replication
  Show dependency treegraph
 
Reported: 2014-06-18 14:12 UTC by Yuvi Panda
Modified: 2014-08-28 15:27 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Yuvi Panda 2014-06-18 14:12:33 UTC
I don't see a reason anyone would want to use the unindexed 'revision' vs 'revision_userindex'. We can rename 'revision' to revision_noindex, and revision_userindex to revision.

Same for the logging table.

The problem this causes is tools/users running queries on revision without reading the docs about revision_userindex, and both getting terrible performance themselves and slowing down the server as well.
Comment 1 Aaron Halfaker 2014-06-18 14:18:37 UTC
Agreed.  I don't see any good reason why the non-indexed thing behind "revision" should occupy such a privileged name.  Is it actually useful for querying?

If we can keep "revision_userindex" and make "revision" an alias to it, we can preserve backwards compatibility as well.
Comment 2 merl 2014-06-18 14:40:24 UTC
According to doc revisions with suppresed username (by os, steward) )are currently only available from revision and not revision_noindex.

But i agree that the indexed version should be the default one because most people don't take care when writing sql queries and the special case to have really all revision available is less important because
Comment 3 Yuvi Panda 2014-06-18 14:41:10 UTC
(In reply to merl from comment #2)
> According to doc revisions with suppresed username (by os, steward) )are
> currently only available from revision and not revision_noindex.

I think you mean revision_userindex :)
Comment 4 Marc A. Pelletier 2014-07-29 13:02:41 UTC
The naming convention was selected on the "least surprise" principle; there are a number of rows not available in the _%index views and those rows would be missing without explanation.

That said, I'd have no fundamental objection to switching the names around if you think that will make things clearer.

I wish MariaDB could be told 'use that view if the query contains a where clause on those columns'  :-)
Comment 5 Yuvi Panda 2014-07-29 13:04:25 UTC
Yeah, I think the rename will actually give people the 'least surprise' - I think running a query you expect to be fast but ends up being super slow and then having to go dig around why is more surprising.
Comment 6 Marc A. Pelletier 2014-07-29 13:21:43 UTC
That's not what I meant (performance); currently selection on 'revision' will give you the same result rows as making that same query in production whereas use of 'revision_userindex' may not.
Comment 7 Aaron Halfaker 2014-07-29 13:50:26 UTC
Let me just say that I'm very surprised that either of the tables is missing rows that the other has.  Why would "userindex" suggest that some rows have been filtered?
Comment 8 Marc A. Pelletier 2014-07-29 13:53:35 UTC
The rows that are filtered in the _userindex views are where the user id and user named have been NULLed because of suppression.  You'd never *notice* the missing rows when using _userindex because you'd use that view while having a WHERE clause on those columns and the rows wouldn't have been returned /anyways/.

If, on the other hand, you have a where clause on the page columns (to recover a article history), then the rows with suppressed usernames would still show in their proper place as you'd get in production / on-wiki.
Comment 9 Aaron Halfaker 2014-07-29 14:01:53 UTC
Marc, as a regular user of this data, that's very unintuitive.

Also:

SELECT COUNT(*) FROM revision_userindex WHERE rev_user IS NULL;
Comment 10 Marc A. Pelletier 2014-07-29 14:02:47 UTC
That last paragraph should have ended "... if you are using the revision table, but mysteriously missing if you are using revision_userindex".
Comment 11 Marc A. Pelletier 2014-07-29 14:03:13 UTC
rev_user may be NULL for reasons /other/ that suppression.  :-)
Comment 12 Marc A. Pelletier 2014-07-29 14:07:21 UTC
(In reply to Aaron Halfaker from comment #9)
> Marc, as a regular user of this data, that's very unintuitive.

Indeed it is, albeit clearly documented at:

https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Tables_for_revision_or_logging_queries_involving_user_names_and_IDs

This is what I meant by 'least surprise'; if you use the revision view, you get the expected (every row) result.

If you have a where clause on rev_user or rev_user_text (that isn't IS NULL for the obvious reason) then you *also* get every row you would have gotten from revision.

The unintuitive result occurs /only/ if you use revision_userindex while not also selecting on the value of rev_user.  This is why the _userindex table was not made the default.
Comment 13 Betacommand 2014-07-29 14:11:10 UTC
Honestly this is a case where we just need to train our users to use the correct view. Nothing should be changed on the server side. As it should reflect production on production tables.

From what I have seen from tools labs our documentation/how to/support system is in need of a complete overhaul and a lot of improvements need to be made. 

This is an easy case where the two tables play different roles and the users who use them should know the differences. Im not sure where the indexes are (rev_user, rev_user_text, multi-colum, or dual indexes) but depending on what your looking for those indexes may or may not help you.
Comment 14 Marc A. Pelletier 2014-08-28 15:27:54 UTC
After careful consideration, the necessity of having revision provide the same view as in production wins over the potential confusion of having some rows missing unless you query a different view.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links