Last modified: 2014-07-07 18:28:14 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T68684, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 66684 - Figure out how to test for database backwards incompatibilities
Figure out how to test for database backwards incompatibilities
Status: NEW
Product: Wikimedia
Classification: Unclassified
Continuous integration (Other open bugs)
wmf-deployment
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-06-16 20:30 UTC by Greg Grossmeier
Modified: 2014-07-07 18:28 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Greg Grossmeier 2014-06-16 20:30:17 UTC
Sometimes database changes are made to core (not likely) or deployed extensions (much more likely) that are not backwards compatible. It's things like new tables or schema changes.

In production, these are handled manually. In the Beta Cluster, we run update.php on a regular basis. As such, finding errors (and not having them magically disappear) is hard.

I'm not (necessarily?) proposing to stop running update.php regularly.

I am trying to figure out a way to catch these types of mistakes before we have outages in production.
Comment 1 Greg Grossmeier 2014-06-16 20:31:57 UTC
Strawman #1:
* We run a set of integration tests "right" before a run of update.php
* We run the same set "right" after.
* Some how compare the two that isn't too noisy/manual.
Comment 2 Bryan Davis 2014-06-17 00:44:09 UTC
(In reply to Greg Grossmeier from comment #0)
> I am trying to figure out a way to catch these types of mistakes before we
> have outages in production.

The errors aren't really present in beta are they? The problem is that we have a gap in code review/procedure that allows changes requiring database schema or massive cache invalidation or similarly disruptive changes (which I think I've heard called "scap traps" before) to be merged without producing some sort of durable list of required actions that are needed to deploy the code in production.

I've had similar problems everywhere I've worked where the size of the development plus operations team was greater than one (and sometimes even when I was working solo). The most easily automated solution I've seen in practice was used at $DAYJOB-1. We used a tool developed in-house that could compare a canonical schema which we kept in version control with the schema of any live database. This tool would emit DDL changes to sync the database with the canonical DDL. For local development and our integration environment these DDL changes would be applied automatically by a script. In our staging and production environments, the DDL alter script would be generated as part of the build for the environment but then manually reviewed and applied by a DBA. The major problem with this approach is scaling it as the deploy cycle accelerates from once per week to once per day/hour/minute.
Comment 3 Greg Grossmeier 2014-06-30 17:47:49 UTC
(In reply to Bryan Davis from comment #2)
> (In reply to Greg Grossmeier from comment #0)
> > I am trying to figure out a way to catch these types of mistakes before we
> > have outages in production.
> 
> The errors aren't really present in beta are they?

From Physikerwelt in the gerrit change that prompted this:
>> See error on betlabs: A database query error has occurred. This may indicate a bug in the software.
>> Function: MathRenderer::readFromDatabase Error: 1146 Table 'labswiki.mathoid' doesn't exist (10.68.17.94)

There is some amount of time between new table dependency is merged and the table is not created on beta cluster (run of update.php) where errors are logged.
Comment 4 Greg Grossmeier 2014-07-07 18:28:14 UTC
(In reply to Greg Grossmeier from comment #3)
> (In reply to Bryan Davis from comment #2)
> > (In reply to Greg Grossmeier from comment #0)
> > > I am trying to figure out a way to catch these types of mistakes before we
> > > have outages in production.
> > 
> > The errors aren't really present in beta are they?
> 
> There is some amount of time between new table dependency is merged and the
> table is not created on beta cluster (run of update.php) where errors are
> logged.

See also: https://bugzilla.wikimedia.org/show_bug.cgi?id=67485

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links