Last modified: 2013-09-04 10:36:18 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T31409, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 29409 - MySQL Cluster segfaults when upgrading to 1.17


Summary:	MySQL Cluster segfaults when upgrading to 1.17

Status:	RESOLVED WONTFIX

Product:	MediaWiki
Classification:	Unclassified
Component:	Database (Other open bugs)
Version:	1.17.x
Hardware:	PC Linux

Importance:	Low normal (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:	patch, patch-need-review, platformeng, upstream

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2011-06-15 08:47 UTC by Johannes Weberhofer
Modified:	2013-09-04 10:36 UTC (History)
CC List:	7 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
This patch fixes the update for mentioned MySQL 5.1.53 Server (745 bytes, patch) 2011-06-15 08:53 UTC, Johannes Weberhofer	Details
Add an attachment (proposed patch, testcase, etc.)

Description Johannes Weberhofer 2011-06-15 08:47:29 UTC

On my OpenSuse sytem, the upgrade failes afer a while:

...
...iwlinks table already exists.
...iwl_prefix_title_from key already set on iwlinks table.
...have ul_value field in updatelog table.
...have iw_api field in interwiki table.
...iwl_prefix key doesn't exist.
...iwl_prefix_from_title key doesn't exist.
Adding cl_collation field to table categorylinks...PHP Warning:  mysql_query(): MySQL server has gone away in /usr/share/mediawiki/includes/db/DatabaseMysql.php on line 23
PHP Warning:  mysql_query(): Error reading result set's header in /usr/share/mediawiki/includes/db/DatabaseMysql.php on line 23
DB connection error: Connection refused (localhost)


MySQL: 5.1.51
PHP: 5.3.5

Comment 1 Johannes Weberhofer 2011-06-15 08:53:58 UTC

Created attachment 8665 [details]
This patch fixes the update for mentioned MySQL 5.1.53 Server

Comment 2 Max Semenik 2011-06-15 09:11:25 UTC

Marking as tarball blocker pending further investigation.

Comment 3 Brion Vibber 2011-06-15 17:35:46 UTC

'server has gone away' usually means one of:

* some bit of data transferred was too large for mysql's max packet size setting, thus the connection was cut off

* the connection was idle for too long while waiting for a query to complete, thus the connection was cut off

The stuff for cl_collation seems to be mostly ALTER TABLE-y so shouldn't transfer data, so timeout is the most likely. I remember us having that sort of problem with long-running dump scripts and such in the past... there's a database::setTimeout() method, which backup.inc's BackupDumper class triggers on its dedicated DB connection:

	function backupDb() {
		$this->lb = wfGetLBFactory()->newMainLB();
		$db = $this->lb->getConnection( DB_SLAVE, 'backup' );

		// Discourage the server from disconnecting us if it takes a long time
		// to read out the big ol' batch query.
		$db->setTimeout( 3600 * 24 );

		return $db;
	}

Might make sense to bump the timeouts during updates too?

Comment 4 Max Semenik 2011-06-15 17:39:18 UTC

Johannes, how large is your database?

Comment 5 Johannes Weberhofer 2011-06-15 18:42:22 UTC

It's quite small. select count(*) from categorylinks; shows a total of 21 lines.
I'm sure it's a mysql bug; I have digged the mysqld-log and found the following:

110615 10:17:14 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.1.51-ndb-7.1.9a-log'  socket: '/var/run/mysql/mysql.sock'  port: 3306  SUSE MySQL RPM
110615 10:19:56 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=16777216
read_buffer_size=262144
max_used_connections=1
max_threads=151
threads_connected=1
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 133916 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x1479d30
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f1778e6be88 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x29) [0x95e4a9]
/usr/sbin/mysqld(handle_segfault+0x400) [0x6387c0]
/lib64/libpthread.so.0(+0xf2d0) [0x7f177e4d82d0]
/usr/sbin/mysqld() [0x733379]
/usr/sbin/mysqld(mysql_alter_table(THD*, char*, char*, st_ha_create_information*, TABLE_LIST*, Alter_info*, unsigned int, st_order*, bool)+0x146f) [0x734caf]
/usr/sbin/mysqld(mysql_execute_command(THD*)+0xe1b) [0x646b2b]
/usr/sbin/mysqld(mysql_parse(THD*, char*, unsigned int, char const**)+0x2d3) [0x64d133]
/usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x542) [0x64d682]
/usr/sbin/mysqld(do_command(THD*)+0xea) [0x64e9da]
/usr/sbin/mysqld(handle_one_connection+0x22d) [0x6401dd]
/lib64/libpthread.so.0(+0x6a3f) [0x7f177e4cfa3f]
/lib64/libc.so.6(clone+0x6d) [0x7f177cc5767d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x14d83e0 = ALTER /* DatabaseBase::sourceFile( /usr/share/mediawiki/maintenance/archives/patch-categorylinks-better-collation.sql )  */ TABLE `categorylinks`
 CHANGE COLUMN cl_sortkey cl_sortkey varbinary(230) NOT NULL default '',
 ADD COLUMN cl_sortkey_prefix varchar(255) binary NOT NULL default '',
 ADD COLUMN cl_collation varbinary(32) NOT NULL default '',
 ADD COLUMN cl_type ENUM('page', 'subcat', 'file') NOT NULL default 'page',
 ADD INDEX (cl_collation),
 DROP INDEX cl_sortkey,
 ADD INDEX cl_sortkey (cl_to, cl_type, cl_sortkey, cl_from)
thd->thread_id=8
thd->killed=NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
110615 10:19:56 mysqld_safe Number of processes running now: 0
110615 10:19:56 mysqld_safe mysqld restarted

Comment 6 Johannes Weberhofer 2011-06-15 19:20:30 UTC

I have filed a bug in http://bugs.mysql.com/bug.php?id=61528

Comment 7 Mark A. Hershberger 2011-06-15 19:26:43 UTC

upstream bug that shouldn't block release, but perhaps be added to the release notes

Comment 8 Mark A. Hershberger 2011-06-16 21:43:31 UTC

I was able to reproduce this with the mysql-cluster-server package in Ubuntu.  See the bug on mysql for the exact steps.

Comment 9 Sam Reed (reedy) 2011-06-16 21:44:41 UTC

If it still works with the other code on other versions, why not just change the sql query as a workaround?

Comment 10 Mark A. Hershberger 2011-06-16 22:36:05 UTC

(In reply to comment #9)
> If it still works with the other code on other versions, why not just change
> the sql query as a workaround?

I'm not against that -- at all -- but I'm not going to say that we *should* put that in the 1.17 tarball.  Adding Tim to this bug so he can make a decision either way.

Comment 11 Tim Starling 2011-06-16 23:11:50 UTC

The MySQL bug should be isolated before we apply any workarounds. If it's not isolated, then we don't know if the workaround really fixes it, or if it just makes the segfault somewhat less likely.

Comment 12 Tim Starling 2011-06-22 01:17:03 UTC

I reproduced it under gdb with debugging symbols, and I've isolated it to some extent. The bug occurs when an index is added to a new field, i.e. a field added in the same ALTER TABLE. 

In sql_table.cc near line 6010:

      /* Key not found. Add the offset of the key to the add buffer. */
      ha_alter_info->index_add_buffer
           [ha_alter_info->index_add_count++]=
           new_key - ha_alter_info->key_info_buffer;
      key_part= new_key->key_part;
      end= key_part + new_key->key_parts;
      for(; key_part != end; key_part++)
      {
        /* Mark field to be part of new key */
        if ((field= table->field[key_part->fieldnr]))
          field->flags|= FIELD_IN_ADD_INDEX;

new_key->key_part->fieldnr appears to be a field index in the new table, but it's used to attempt to fetch field information from the old table.

(gdb) print *new_key
$30 = {key_length = 32, flags = 40, key_parts = 1, extra_length = 0, usable_key_parts = 2, block_size = 0, 
  algorithm = HA_KEY_ALG_UNDEF, {parser = 0x0, parser_name = 0x0}, key_part = 0x7ffff8a26208, 
  name = 0x7ffff8a22f50 "cl_collation", rec_per_key = 0x0, handler = {bdb_return_if_eq = 0}, table = 0x0}

(gdb) print *key_part
$31 = {field = 0x0, offset = 751, null_offset = 0, length = 32, store_length = 0, key_type = 1, 
  fieldnr = 5, key_part_flag = 0, type = 0 '\000', null_bit = 0 '\000'}

(gdb) print key_part->fieldnr
$15 = 5

(gdb) print table->field[0]->field_name
$33 = 0x7ffff8a24599 "cl_from"
(gdb) print table->field[1]->field_name
$34 = 0x7ffff8a245a1 "cl_to"
(gdb) print table->field[2]->field_name
$35 = 0x7ffff8a245a7 "cl_sortkey"
(gdb) print table->field[3]->field_name
$36 = 0x7ffff8a245b2 "cl_timestamp"
(gdb) print table->field[4]->field_name
Cannot access memory at address 0x30
(gdb) print table->field[5]->field_name
warning: can't find linker symbol for virtual table for `Field' value
warning:   found `Field_longlong::~Field_longlong()' instead
$37 = 0x7ffff78ff544 "UH\211\345SH\203\354hH\211}\250H\211u\240\211U\234dH\213\004%("

table->field[5] points into arbitrary memory, and so the attempt to write to table->field[5]->flags causes a segfault.

I'll add this to the MySQL bug once dev.mysql.com stops timing out.

Comment 13 Tim Starling 2011-06-22 01:41:44 UTC

This reduced test case also works:

CREATE TABLE a ( a INT ) ENGINE=MyISAM;
ALTER TABLE a ADD COLUMN b INT, ADD INDEX (b);

I think this is too broken to bother trying to fix on our side. We must have loads of patches that add indexes on new columns. I can't believe that 1.17 is the first major version that doesn't work on MySQL Cluster. Lowering priority.

Comment 14 Sumana Harihareswara 2011-11-09 21:18:52 UTC

Does the patch from Johannes still need review?

Comment 15 Sam Reed (reedy) 2011-11-09 21:23:06 UTC

The patch looks sane, and should work on other versions too. But per Tims comment (which I agree with, as it seems very strange/borderline)

(In reply to comment #13)
> I think this is too broken to bother trying to fix on our side. We must have
> loads of patches that add indexes on new columns. I can't believe that 1.17 is
> the first major version that doesn't work on MySQL Cluster. Lowering priority.

Certainly, I personally can't see any issue just changing the db patch per the above patch, as it shouldn't cause any more issues

Comment 16 Antoine "hashar" Musso (WMF) 2012-03-04 11:23:36 UTC

+platformeng

So should we apply patch for 1.20 or should we just drop the bug report as it is an upstream issue?

Comment 17 Max Semenik 2012-03-04 11:34:51 UTC

(In reply to comment #16)
> +platformeng
> 
> So should we apply patch for 1.20 or should we just drop the bug report as it
> is an upstream issue?

Separating the alter operation in two will result in upgrade running twice as slow, because it will rebuild the table twice instead of once. Could be noticeable on large installations.

Comment 18 Mark A. Hershberger 2012-03-04 17:59:00 UTC

(In reply to comment #16) 
> So should we apply patch for 1.20 or should we just drop the bug report as it
> is an upstream issue?

Is there a reason this couldn't be applied to the 1.19 tarball?

Comment 19 Tim Starling 2012-03-04 22:41:53 UTC

(In reply to comment #18)
> (In reply to comment #16) 
> > So should we apply patch for 1.20 or should we just drop the bug report as it
> > is an upstream issue?
> 
> Is there a reason this couldn't be applied to the 1.19 tarball?

Yes, it makes upgrades slower and it doesn't fix the bug. As I said in comment #13, I don't think we should make any changes. It's not our problem.

Comment 20 Antoine "hashar" Musso (WMF) 2012-03-05 08:45:07 UTC

So I am closing this bug per comment 13 and comment 17.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links