Last modified: 2012-07-24 18:29:22 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T40473, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 38473 - instances can not boot / reboot anymore
instances can not boot / reboot anymore
Status: RESOLVED FIXED
Product: Wikimedia Labs
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Normal critical
: ---
Assigned To: Ryan Lane
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-18 14:37 UTC by Antoine "hashar" Musso (WMF)
Modified: 2012-07-24 18:29 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Antoine "hashar" Musso (WMF) 2012-07-18 14:37:34 UTC
appeared this afternoon (Europe) with an instance I rebooted (and then deleted) and with the fresh instance i-0000034b :

cloud-init running: Wed, 18 Jul 2012 13:47:07 +0000. up 5.13 seconds
waiting for metadata service at http://169.254.169.254/2009-04-04/meta-data/instance-id
  13:47:10 [ 1/100]: url error [timed out]
  13:47:13 [ 2/100]: url error [timed out]
  13:47:16 [ 3/100]: url error [timed out]
  13:47:19 [ 4/100]: url error [timed out]


So we can not reboot nor access any new instance.
Comment 1 Antoine "hashar" Musso (WMF) 2012-07-18 14:47:13 UTC
reverting subject back. That happens on instance boot so affect both existing and newly created instances.
Comment 2 Antoine "hashar" Musso (WMF) 2012-07-19 12:55:06 UTC
Mail by Ryan Lane in labs-l :

OpenStack Nova does some inefficient queries for searching for
instances, especially when pulling metadata information. When
instances are created, we inject what's called "userdata". That
userdata is used by a service in ubuntu called cloud-init. cloud-init
pulls this userdata, along with the instance's metadata and does the
initial bootstrapping of the system.

For us, cloud-init, using the userdata, installs puppet, points it at
our puppet server and does a full puppet run. This process is
currently failing, as cloud-init thinks the metadata server is timing
out. Apparently other deployments of openstack are having performance
issues with metadata that are due to the same inefficient query.

There's two things we'll be doing about this:

1. We're working on a fix with the other openstack devs
2. We'll purge deleted instances from the database. Nova keeps them
for auditing purposes, but they are unneeded. This will bring the
query speed down to a level that is below the cloud-init threshold.
We're also waiting for another organization to push their solution for
this upstream, rather than mucking around the database ourselves.

Sorry for the inconvenience,

- Ryan
Comment 3 Antoine "hashar" Musso (WMF) 2012-07-19 12:55:50 UTC
Per above, Ryan is working on it with OpenStack devs.
Comment 4 Ryan Lane 2012-07-24 18:29:22 UTC
This has been fixed.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links