Last modified: 2014-03-10 10:14:34 UTC
Aaron Schulz wrote: I noticed that https://gerrit.wikimedia.org/r/#/c/33971/ passed the tests but after it was merged, the new tests started failing for everything. The commit to revert it also failed so I override Jenkins and merged anyway, and the failures went away for new commits. This indicates that something broken is going, possibly Jenkins running tests just against master rather than master + the patch, which would explain this problem.
Related URL: https://gerrit.wikimedia.org/r/58283 (Gerrit Change I4b3fadccaae9c35964a0c47d63b22c4f35148a24)
From bug 47031 : https://gerrit.wikimedia.org/r/#/c/57436/ has been merged although it is faulty. The unit tests ran on patchset upload did catch the issue: https://integration.wikimedia.org/ci/job/mediawiki-core-phpunit-misc/5222/console : FAILURE But the gating run after CR+2 did not catch it: https://integration.wikimedia.org/ci/job/mediawiki-core-phpunit-misc/5223/console : SUCCESS The root cause is that despite the ZUUL_REF points to the proper merge commit, the Jenkins Git plugin seems to use the current origin/master to build.
build #5223 Workspace did get wiped: 02:46:53 Wiping out workspace first. It check out the revision: 02:46:56 Checking out Revision 4c69569db71d149feff6c4b10ea7a493425d67fd (origin/master) That is the master revision NOT the change. The commit should have been 7dd3356a51951f8cdfe463552b5e5aae272e8e60 ---- The related merge job https://integration.wikimedia.org/ci/job/mediawiki-core-merge/11333/console 02:44:17 Commencing build of Revision 7dd3356a51951f8cdfe463552b5e5aae272e8e60 (origin/master) 02:44:17 Checking out Revision 7dd3356a51951f8cdfe463552b5e5aae272e8e60 (origin/master) ---- The ZUUL_REF has probably not been resolved properly and the git plugin fallback to master. There is also the possibility that the mediawiki-core-phpunit-misc job was using ZUUL_COMMIT as a refspec instead of ZUUL_REF. That might prevent the plugin from fetching the revision. The job history is no more accessible due to an unexpected upgrade (see bug 47040).
Created attachment 12065 [details] python script parsing build logs to find Zuul commit vs Git plugin checkout
Created attachment 12066 [details] output of checkbug46723.py The result script output highlight that some builds are not testing what they should be testing because they check out a parent commit. By looking at the Jenkins Git plugin source code, it seems that whenever the reference is not parseable (aka: git rev-parse $ZUUL_REF), the plugin fallback to use master or some parent commit. I need to improve the script to find out if that happens in a specific pipeline or for some specific refs.
Extract for the two builds referenced somewhere above: Verifying /var/lib/jenkins/jobs/mediawiki-core-phpunit-misc/builds/5222/log Zuulcommit: 8cc0b601aa2db6db09ac0e4d70847293d75875aa Checkedout: 8cc0b601aa2db6db09ac0e4d70847293d75875aa Verifying /var/lib/jenkins/jobs/mediawiki-core-phpunit-misc/builds/5223/log Zuulcommit: 7dd3356a51951f8cdfe463552b5e5aae272e8e60 Checkedout: 4c69569db71d149feff6c4b10ea7a493425d67fd (MISMATCH) We can see that build 5223 did not used the proper commit :-] I suspect git plugin does not fetch the proper references / can't find it. That result internally in an unknown sha1 and then git plugin fallback to master or something else. I will try to reproduce the issue in labs with git plugin set to verbose. That needs to start Jenkins with -Dhudson.plugins.git.GitSCM.verbose=true
I have traced the issue as far as mediawiki-core-lint build #19 from made on November 22nd 2012). MISMATCH in /var/lib/jenkins/jobs/mediawiki-core-lint/builds/19/log Pipeline: gate Zuulcommit: 76606b66b006ac0e62087e6d00b1e4bdd56fff09 Checkedout: 232e34733fc68739ba96cccc31d3ff88f9484a23
We are lacking the git plugin verbose mode in production due to a bug. It is corrected with https://gerrit.wikimedia.org/r/58489 . That will help find out what the plugin is doing internally.
Created attachment 12084 [details] Console output for https://integration.wikimedia.org/ci/job/mediawiki-core-phpunit-parser/5386/console
ZUUL_COMMIT=76cb37f0c69dcd69884fc6e66681e77c8045a08e but it fetched origin/master instead :-(
The branch specifier in the git plugin is set to ZUUL_BRANCH which is 'master'. In the git plugin (at git-plugin/src/main/java/hudson/plugins/git/util/DefaultBuildChooser.java ), the getCandidateRevisions() will recognize whether the branch looks like a sha1 (if it matches /[0-9a-f]{6,40}/) and in such a case will create a detached branch using that commit. Seems the Jenkins job macro should then use ZUUL_COMMIT as a branch specifier.
Related URL: https://gerrit.wikimedia.org/r/58865 (Gerrit Change Iafebfffe480886fc8956e56517291b1b3b1fc0cc)
I have updated mediawiki-core-whitespaces job to use ZUUL_COMMIT as a refspec specifier. The job is non voting so that is not going to do any harm. The experimental change is https://gerrit.wikimedia.org/r/58865
(In reply to comment #13) > Related URL: https://gerrit.wikimedia.org/r/58865 (Gerrit Change > Iafebfffe480886fc8956e56517291b1b3b1fc0cc) Why is this comment duplicated?
*** Bug 47208 has been marked as a duplicate of this bug. ***
https://gerrit.wikimedia.org/r/58865 (Gerrit Change Iafebfffe480886fc8956e56517291b1b3b1fc0cc) | change APPROVED and MERGED [by Hashar]
https://gerrit.wikimedia.org/r/#/c/58865/ has been deployed. I am now manually updating the jobs which are not under JJB: analytics-libanon analytics-udp-filters analytics-webstatscollector analytics-wikistats mwext-PoolCounter-pep8 mwext-VisualEditor-docgen operations-debs-python-voluptuous-debbuild parsoid-parse-tool-check parsoid-roundtrip-test-check parsoid-runTests test-mediawiki-merge
Will monitor over the next few days. Lowering priority for now.
hashar@gallium:~$ ./checkbug46723.py mediawiki-core-phpunit-api --filter 2013-04-16* Found 0 mismatches in 29 log files. hashar@gallium:~$ ./checkbug46723.py mediawiki-core-phpunit-misc --filter 2013-04-16* Found 0 mismatches in 29 log files. $ Seems it got fixed :-] Will verify again during the week, but so far that looks good.
I have verified the jobs triggered over the past few days. Seems to work fine now :-) The root cause was using ZUUL_BRANCH as a branch specifier instead of ZUUL_COMMIT.
Change 117045 had a related patch set uploaded by Hashar: Parsoid: uses ZUUL_COMMIT as a git refspec to build https://gerrit.wikimedia.org/r/117045
Change 117045 merged by jenkins-bot: Parsoid: uses ZUUL_COMMIT as a git refspec to build https://gerrit.wikimedia.org/r/117045