Last modified: 2011-12-21 20:36:52 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T31550, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 29550 - Move TestSwarm from toolserver to common infrastructure with Jenkins etc
Move TestSwarm from toolserver to common infrastructure with Jenkins etc
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
Continuous integration (Other open bugs)
unspecified
All All
: High enhancement (vote)
: ---
Assigned To: Antoine "hashar" Musso (WMF)
http://toolserver.org/~krinkle/testsw...
: platformeng
Depends on: 32433 32644
Blocks: 29549 30001 30888 32645
  Show dependency treegraph
 
Reported: 2011-06-23 17:39 UTC by Brion Vibber
Modified: 2011-12-21 20:36 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Brion Vibber 2011-06-23 17:39:03 UTC
Running the TestSwarm stuff from toolserver works well enough for now, but eventually we'll want to move it over to WMF-run servers so Krinkle doesn't have to personally do every future bit of maintenance on it. :)
Comment 1 Chad H. 2011-06-23 21:42:21 UTC
Can we get an actual box for this? Right now cruise control is on a VM and I think having a dedicated box for running our unit tests (qunit && phpunit && any possible future thingies) would be a good idea.
Comment 2 Brion Vibber 2011-06-23 21:46:36 UTC
VMs should be fine as long as they have adequate resources. IMO dedicating standalone individual boxes for so many disparate services gets to be a major administrative PITA.

On the other hand if we can show that running on a dedicated machine makes the unit tests process a lot faster, it could be worth it.
Comment 3 Chad H. 2011-06-23 22:14:02 UTC
(In reply to comment #2)
> On the other hand if we can show that running on a dedicated machine makes the
> unit tests process a lot faster, it could be worth it.
>

Well we've already got cruise control down to ~2.5 minutes per run, so there's not too much to improve on :)
Comment 4 Krinkle 2011-06-24 19:37:19 UTC
This will happen when I'm in SF with Ryan Lane.
Comment 5 Brion Vibber 2011-08-16 19:21:35 UTC
Toolserver's down at the moment; we have no access to the testswarm setup until it's back. :(
Comment 6 Chad H. 2011-08-16 20:23:31 UTC
This is in the process of being moved to physical hardware in eqiad. Right now the machine is setup and networking is completed. We're waiting on a base OS install before moving forward.
Comment 7 Brion Vibber 2011-08-16 20:56:06 UTC
Is this something that can be done remotely or is it waiting on Rob or someone to be available on-site?

This sort of infrastructure thing really shouldn't be bottlenecking us; whatever next step is needed, I'd like to know what we can do to make it happen.
Comment 8 Chad H. 2011-08-16 21:00:26 UTC
(In reply to comment #7)
> Is this something that can be done remotely or is it waiting on Rob or someone
> to be available on-site?
> 

I assume any of the roots can. The RT ticket is assigned to Rob currently.

> This sort of infrastructure thing really shouldn't be bottlenecking us;
> whatever next step is needed, I'd like to know what we can do to make it
> happen.

We're building out a new box that we don't have documentation on. That's part of what's taking us awhile in getting it going. We're trying to document the config properly (and puppetize most of it) so future buildouts won't take so long.
Comment 9 Brion Vibber 2011-08-16 21:25:14 UTC
Looks like Toolserver's back up and TestSwarm looks like it's recovered (starting to run IE tests again from my clients), so if it doesn't die again we're good for a little longer. :)

For reference the RT ticket is http://rt.wikimedia.org/Ticket/Display.html?id=1204 though I'm having trouble getting in to read it for now.
Comment 10 Brion Vibber 2011-08-31 20:47:19 UTC
Quick update: this still needs some software setup on the server, but it's been starting to move lately. :)
Comment 11 Brion Vibber 2011-09-12 18:36:35 UTC
Chad, Krinkle -- any news? The RT ticket's last update was August 26, reassigned from Chad to RobH. Does Rob still need to do anything before you guys can continue or has it moved beyond that stage? What can we do to help this move faster?
Comment 12 Brion Vibber 2011-09-12 18:51:00 UTC
Per RobH, ops needs to finish puppetizing the server configuration -- Rob says he should be able to get to it tomorrow or else get someone else to finish it.

Chad & Krinkle -- can you give an update on status on your end? Thanks!
Comment 13 Chad H. 2011-09-12 19:42:02 UTC
(In reply to comment #12)
> Per RobH, ops needs to finish puppetizing the server configuration -- Rob says
> he should be able to get to it tomorrow or else get someone else to finish it.
> 
> Chad & Krinkle -- can you give an update on status on your end? Thanks!

No updates from me, I'm just waiting on the hardware to be done :) The ci2 instance has been doing builds at 5min intervals for ~1 week now.
Comment 15 Brion Vibber 2011-09-19 22:03:02 UTC
Last updates on the RT entry include questions about whether it's safe to install things from the jenkins third-party repository, asked on September 14.

Who needs to respond on this and who can do so? Thanks!
Comment 16 Chad H. 2011-09-19 22:09:12 UTC
(In reply to comment #15)
> Last updates on the RT entry include questions about whether it's safe to
> install things from the jenkins third-party repository, asked on September 14.
> 
> Who needs to respond on this and who can do so? Thanks!

We discussed this on IRC when the question about 3rd-party repositories came up. Mark expressed concern about using third party repositories, and said we should pull the jenkins packages into the Wikimedia repo. I do not know what the status of that is.
Comment 17 Brion Vibber 2011-10-21 00:34:11 UTC
Just checking up on the current status of this; PHPUnit runs have been migrated to Jenkins running on the new http://integration.mediawiki.org/ though TestSwarm is still marked as not yet active.

Looks like there are a few updates on [[mw:Continuous integration/Task management]]... but the latest edits talk about work on some sort of Special: page; I'm worried that this may further delay simply having the tests working.


Can we get the tests as they exist running on integration.mediawiki.org ASAP? I've had most of my test-runner clients down for the last month on Timo's request as it bogs down his Toolserver connection limit, and that means we're not actually recording regular results for a bunch of versions.

We also haven't yet made any of the other tweaks (bug 29549, bug 30000, bug 30001, bug 30901) to get the tests running on current versions of Firefox and Safari, mobile browsers, or the IE 10 developer preview.

This makes it a lot harder to test things, as the versions of Firefox, Safari, and Opera that people have handy don't run the automatic jobs or save any results.
Comment 18 Brion Vibber 2011-10-24 22:19:14 UTC
Last update from Chad on the RT ticket today: "Setting up TestSwarm will be done via puppet over the coming week."
Comment 19 Krinkle 2011-12-19 22:53:17 UTC
Note that the current script as it is prepared for the TestSwarm server will not deprecate the Toolserver setup just yet.

Aside from the reason that the Toolserver setup is not maintainable and scalable for us, it also has the problem of not being intended for (and as such we're not doing) a full MediaWiki initialization for these tests. In order to do this we need to run tests from a SpecialPage instead of a static index.html file.

Code for this is in the JSTesting branch (to be merged in trunk core), and the script to checkout MediaWiki, install it and submit URLs to TestSwarm is being written by Hashar and me  - to be deployed on the TestSwarm server. That script is mostly ready and initial tests show good results.

More on https://www.mediawiki.org/wiki/Continuous_integration/Task_management
Comment 20 Antoine "hashar" Musso (WMF) 2011-12-21 17:50:02 UTC
TestSwarm is now in production at http://integration.mediawiki.org/testswarm/ and https://integration.mediawiki.org/testswarm/ (HTTPS).

Ops RT tickets have been closed.

You will certainly notice various bugs related to the apps since some patches from the tool server did not get migrated. The installation is a fresh one from github testwarm v0.1.0. So please open new bug there or here :)
Comment 21 Brion Vibber 2011-12-21 17:55:38 UTC
You have asked Firefox to connect
securely to integration.mediawiki.org, but we can't confirm that your connection is secure.


integration.mediawiki.org uses an invalid security certificate.

The certificate is only valid for the following names:
  *.wikimedia.org , wikimedia.org  

(Error code: ssl_error_bad_cert_domain)
Comment 22 Brion Vibber 2011-12-21 17:58:46 UTC
Split that out as bug 33301, reclosing this one.
Comment 23 Brion Vibber 2011-12-21 18:09:38 UTC
Where are the test result pages? We've lost the link that used to be there.
Comment 24 Antoine "hashar" Musso (WMF) 2011-12-21 20:36:52 UTC
Jobs are submitted with the MediaWiki username. You can get his tests results at:
http://integration.mediawiki.org/testswarm/user/MediaWiki/

The old dashboards at toolserver need to be sent upstream and then we will be able to update the software :)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links