Last modified: 2014-07-07 18:28:14 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T68684, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 66684 - Figure out how to test for database backwards incompatibilities


Summary:	Figure out how to test for database backwards incompatibilities

Status:	NEW

Product:	Wikimedia
Classification:	Unclassified
Component:	Continuous integration (Other open bugs)
Version:	wmf-deployment
Hardware:	All All

Importance:	Normal enhancement (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2014-06-16 20:30 UTC by Greg Grossmeier
Modified:	2014-07-07 18:28 UTC (History)
CC List:	8 users (show)

See Also:	67485
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Greg Grossmeier 2014-06-16 20:30:17 UTC

Sometimes database changes are made to core (not likely) or deployed extensions (much more likely) that are not backwards compatible. It's things like new tables or schema changes.

In production, these are handled manually. In the Beta Cluster, we run update.php on a regular basis. As such, finding errors (and not having them magically disappear) is hard.

I'm not (necessarily?) proposing to stop running update.php regularly.

I am trying to figure out a way to catch these types of mistakes before we have outages in production.

Comment 1 Greg Grossmeier 2014-06-16 20:31:57 UTC

Strawman #1:
* We run a set of integration tests "right" before a run of update.php
* We run the same set "right" after.
* Some how compare the two that isn't too noisy/manual.

Comment 2 Bryan Davis 2014-06-17 00:44:09 UTC

(In reply to Greg Grossmeier from comment #0)
> I am trying to figure out a way to catch these types of mistakes before we
> have outages in production.

The errors aren't really present in beta are they? The problem is that we have a gap in code review/procedure that allows changes requiring database schema or massive cache invalidation or similarly disruptive changes (which I think I've heard called "scap traps" before) to be merged without producing some sort of durable list of required actions that are needed to deploy the code in production.

I've had similar problems everywhere I've worked where the size of the development plus operations team was greater than one (and sometimes even when I was working solo). The most easily automated solution I've seen in practice was used at $DAYJOB-1. We used a tool developed in-house that could compare a canonical schema which we kept in version control with the schema of any live database. This tool would emit DDL changes to sync the database with the canonical DDL. For local development and our integration environment these DDL changes would be applied automatically by a script. In our staging and production environments, the DDL alter script would be generated as part of the build for the environment but then manually reviewed and applied by a DBA. The major problem with this approach is scaling it as the deploy cycle accelerates from once per week to once per day/hour/minute.

Comment 3 Greg Grossmeier 2014-06-30 17:47:49 UTC

(In reply to Bryan Davis from comment #2)
> (In reply to Greg Grossmeier from comment #0)
> > I am trying to figure out a way to catch these types of mistakes before we
> > have outages in production.
> 
> The errors aren't really present in beta are they?

From Physikerwelt in the gerrit change that prompted this:
>> See error on betlabs: A database query error has occurred. This may indicate a bug in the software.
>> Function: MathRenderer::readFromDatabase Error: 1146 Table 'labswiki.mathoid' doesn't exist (10.68.17.94)

There is some amount of time between new table dependency is merged and the table is not created on beta cluster (run of update.php) where errors are logged.

Comment 4 Greg Grossmeier 2014-07-07 18:28:14 UTC

(In reply to Greg Grossmeier from comment #3)
> (In reply to Bryan Davis from comment #2)
> > (In reply to Greg Grossmeier from comment #0)
> > > I am trying to figure out a way to catch these types of mistakes before we
> > > have outages in production.
> > 
> > The errors aren't really present in beta are they?
> 
> There is some amount of time between new table dependency is merged and the
> table is not created on beta cluster (run of update.php) where errors are
> logged.

See also: https://bugzilla.wikimedia.org/show_bug.cgi?id=67485

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links