Last modified: 2014-07-08 12:01:25 UTC
On translatewiki.net during running repoupdate script: Randomly the script bails out with hash mismatch key_verify failed for server_host_key fatal: The remote end hung up unexpectedly error: Could not fetch origin This happens since migration of Gerrit to the new server two days ago.
We retained the same key for exactly this reason...
So, this sounds like you've got an old entry in your known_hosts files pointing to the old box. We changed IP addresses when moving servers (shouldn't have to ever happen again), so please check your known_hosts for any outdated entries that you can remove.
(In reply to comment #2) > We changed IP addresses when moving servers (shouldn't have to ever happen > again), so please check your known_hosts for any outdated entries that you > can remove. How can I identify outdated entries? There are no IP addresses in known_hosts. Sample entry: |1|umKi+qzw6pf8uXi/Z6/KtqlisCw=|YFoX/CdDjXhcVUVJ803EiP9nyro= ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEA2JmNg8ir9QvWwmS/C2k0PEqty1O26D0Nq24YGKC5jq1cr/0a92Pk7wa9FMMM/2O88bbe6rXZUPBKzDX1vVtYD+5vR4/c1XTnHWlNJ9sd6xSYjHhznqYs81VnjGMCLMPV1GhlIfUZsnQ+ w1FaQUvJe39TEtwADA7ZOFAfT0M/Oqk=
Still seeing this error randomly.
Several reports of this in the last few days. Reporters include Krenair, YuviPanda, and Krinkle.
(Worked for me when I tried again)
Just happened to me w/operations/puppet. $ git pull hash mismatch key_verify failed for server_host_key fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists.
(In reply to comment #7) > Just happened to me w/operations/puppet. > > $ git pull > hash mismatch > key_verify failed for server_host_key > fatal: Could not read from remote repository. > > Please make sure you have the correct access rights > and the repository exists. We see similar errors very regularly when updating 600 or so extension repos at translatewiki.net. I'm pretty certain that we have the correct access rights with L10n-bot, have the correct access rights at the local machine, and have consistent scripting up update the repos. A run I did just now resulted in the following errors: Permission denied (publickey). fatal: The remote end hung up unexpectedly error: Could not fetch origin /resources/siebrand/mediawiki-extensions/extensions/CategoryMagicWords failed to update Permission denied (publickey). fatal: The remote end hung up unexpectedly error: Could not fetch origin /resources/siebrand/mediawiki-extensions/extensions/ReplaceSet failed to update Just to make sure that it wasn't me configuring the two above repos incorrectly, I ran the updates again. This time with the following result: Permission denied (publickey). fatal: The remote end hung up unexpectedly error: Could not fetch origin /resources/siebrand/mediawiki-extensions/extensions/DidYouKnow failed to update Permission denied (publickey). fatal: The remote end hung up unexpectedly error: Could not fetch gerrit /resources/siebrand/mediawiki-extensions/extensions/FormatDates failed to update Permission denied (publickey). fatal: The remote end hung up unexpectedly error: Could not fetch gerrit /resources/siebrand/mediawiki-extensions/extensions/GoogleDocTag failed to update Permission denied (publickey). fatal: The remote end hung up unexpectedly error: Could not fetch origin /resources/siebrand/mediawiki-extensions/extensions/InviteSignup failed to update Permission denied (publickey). fatal: The remote end hung up unexpectedly error: Could not fetch origin /resources/siebrand/mediawiki-extensions/extensions/LightweightRDFa failed to update Permission denied (publickey). fatal: The remote end hung up unexpectedly error: Could not fetch origin Permission denied (publickey). fatal: The remote end hung up unexpectedly error: Could not fetch gerrit /resources/siebrand/mediawiki-extensions/extensions/Numbertext failed to update /resources/siebrand/mediawiki-extensions/extensions/NumberOfWikis failed to update Permission denied (publickey). fatal: The remote end hung up unexpectedly error: Could not fetch gerrit /resources/siebrand/mediawiki-extensions/extensions/PageLanguage failed to update Permission denied (publickey). fatal: The remote end hung up unexpectedly error: Could not fetch origin /resources/siebrand/mediawiki-extensions/extensions/SidebarDonateBox failed to update hash mismatch key_verify failed for server_host_key fatal: The remote end hung up unexpectedly error: Could not fetch gerrit /resources/siebrand/mediawiki-extensions/extensions/UserStatus failed to update Permission denied (publickey). fatal: The remote end hung up unexpectedly error: Could not fetch gerrit /resources/siebrand/mediawiki-extensions/extensions/VersionView failed to update To compare, when updating repos on localhost form GitHub, I've not seen a similar error once.
*** Bug 57483 has been marked as a duplicate of this bug. ***
That does happen once or two per day on Zuul. Usually "hash mismatch" errors though we had some host key verification failed on Nov 20th.
Also got one today in command line. FYI: the -1s in Jenkins caused by this are very confusing.
Is this related: https://gerrit.wikimedia.org/r/#/c/107036/ ?
(In reply to comment #12) > Is this related: https://gerrit.wikimedia.org/r/#/c/107036/ ? Looking at Zuul debugging log on gallium.wikimedia.org it is a different issue. Filled another bug 59991 for it. Seems to be an issue in the python git module.
Another example, this time with the job that sync VisualEditor in mediawiki/extensions.git. The merge of https://gerrit.wikimedia.org/r/#/c/111608/ triggered job http://integration.wikimedia.org/ci/job/mwext-VisualEditor-sync-gerrit/61/console which shows: ssh -i /var/lib/jenkins/.ssh/jenkins-mwext-sync_id_rsa \ -p 29418 jenkins-mwext-sync@gerrit.wikimedia.org \ 'gerrit review --code-review +2 --verified +2 --submit b519550809bba725b017281fe6c33c4c2fd123c1' hash mismatch key_verify failed for server_host_key
This continues to happen, nearly daily. You could probably get a good list of affected changesets by grepping logs of #wikimedia-dev for my name and "ignore jenkins" :/
Subsided for a while, then started happening a bit more often for me locally. Example in Gerrit from today: https://gerrit.wikimedia.org/r/#/c/138992/1
Could somebody tcpdump it? It seems it me more like a broken (suddenly terminated) connection, probably occuring (mostly) early in the SSH negotiation phase.
Today again: https://gerrit.wikimedia.org/r/#/c/139047/
Two examples from just today: * https://gerrit.wikimedia.org/r/#/c/139807/ * https://gerrit.wikimedia.org/r/#/c/140046/
Bartosz: there is no need more for more examples. We have traces of those errors in Zuul log and it happens a couple time per day. Marcin: we could tcpdump it if only we had a way to reliably reproduce the issue :-(
Hi, I've been able to reproduce this on a local Gerrit instance quite reliably by running the following: while true; do ssh <gerrit> -p 29418; done A workaround that does work is to use the bouncy castle SSL library. See the following thread for more info: https://groups.google.com/forum/#!topic/repo-discuss/JE7OM6o7DMs
The google group topic mentioned this issue in Apache mina-sshd (upstream from Gerrit): https://issues.apache.org/jira/browse/SSHD-330 Which has been fixed in https://git-wip-us.apache.org/repos/asf?p=mina-sshd.git;a=commit;h=2aed686bdb21681a421033c6ee5997e5cd8a9a83 If that is indeed the root issue, we them to make a minor release and Gerrit to upgrade to it.
The description of the SSHD-330 issue explains pretty much every aspect of the bug that we experienced. From it's sporadic nature to the ways some people could reproduce, but others couldn't. I'll see to preparing a new gerrit release ... hopefully we can get something deployed around that.
Change 143388 had a related patch set uploaded by QChris: Upgrade sshd to include the fix for hash mismatch https://gerrit.wikimedia.org/r/143388
Christian could you possibly providee a gerrit.war that has the patch ? I would like to test it out on the labs instance I am using for CI dev. Thanks!
(In reply to Antoine "hashar" Musso from comment #25) > Christian could you possibly providee a gerrit.war that has the patch ? Sure. For the next 2 weeks, you can fetch it from http://quelltextlich.at/gerrit-2.8.1-4-ga1048ce.war > I > would like to test it out on the labs instance I am using for CI dev. Seeing the description of SSHD-330 allowed me to come up with an environment that allows to reproduce the bug. There, our deployed gerrit war failed for 14 of 10000 connection attempts. The war I linked above showed 0 failures for 10000 connection attempts. ^d already said he'll discuss deploying the war with greg-g. So we'll hopefully see it live soon.
I have upgraded Gerrit on my test instance integration-dev.eqiad.wmflabs . There is no more any hash mismatch triggered when running for a while: while true; do ssh -p 29418 localhost; done;
Since this bug has been around for a while and has affected quite some people, I've been asked to give a short explanation of the root issue and what SSHD-330 does. Gerrit uses Apache Mina's SSHD [1] as ssh server. When connecting to gerrit through ssh, this ssh server uses Java's own crypto/security implementations to negotiate session keys (i.e.: different for each connection attempt) with the client. Java's default provider yielded those session keys without leading zero bytes, and Apache Mina's SSHD relied on no leading zero bytes being present. But at some point Java [2] changed behaviour and is no longer stripping leading zero bytes, but Apache Mina SSHD still relied on no leading zero bytes being present. Hence assumptions mismatched and caused the issue. The Java we use at gerrit.wikimedia.org is recent enough to no longer strip leading zero bytes. So when connecting to our gerrit through ssh, either * the negotiated session key starts with a non-zero byte, and everything works nicely. This case happens most of the time. * the negotiated session key starts with a zero byte. Then gerrit's built-in Apache Mina SSHD falsely treats the key as if there were no leading zero bytes and the connection setup with the client fails. SSHD-330 adds stripping of leading zero bytes from the session key to Apache Mina SSHD and thereby fixes the issue we are seeing. ------ There was recently some FUD around OpenSSL generated keys not being affected. That did not work for me, and I do not see in code how this would make a difference. Also, there was some recent discussion around extracting the keys from the keystore to proper files. I did not get a chance to try that, but that could do the trick too ... indirectly. Because in order to get gerrit to use keys from separate files, one needs to install BouncyCastle libraries to gerrit. BouncyCastle will act as provider for the needed security/crypto functionality and get used instead of Java's default providers. As the BouncyCastle providers (for now) also strip the leading zero bytes, that could work out. Regardless, having Apache Mina SSHD to strip leading zero bytes seems most reliable, so we backported the Apache Mina SSHD's upstream fix to the version used in our gerrit, and rebuilt gerrit using that custom built Apache Mina SSHD. [1] https://mina.apache.org/sshd-project/ [2] I know that OpenJDK versions up to OpenJDK Runtime Environment (IcedTea7 2.2.1) (Gentoo build 1.7.0_05-b21) work and the default providers strip the leading zeros, while the ones from OpenJDK Runtime Environment (IcedTea 2.4.7) (7u55-2.4.7-1ubuntu1~0.12.04.2) do not strip them. Thanks Krinkle for the pointer to SSHD-330!
Change 143388 merged by Chad: Upgrade sshd to include the fix for hash mismatch https://gerrit.wikimedia.org/r/143388
The fix has been deployed at gerrit.wikimedia.org.
<3
(In reply to christian from comment #28) > Since this bug has been around for a while and has affected quite some > people, I've been asked to give a short explanation of the root issue > and what SSHD-330 does. > > Gerrit uses Apache Mina's SSHD [1] as ssh server. When connecting to > gerrit through ssh, this ssh server uses Java's own crypto/security > implementations to negotiate session keys (i.e.: different for each > connection attempt) with the client. Java's default provider yielded > those session keys without leading zero bytes, and Apache Mina's SSHD > relied on no leading zero bytes being present. > > But at some point Java [2] changed behaviour and is no longer > stripping leading zero bytes, but Apache Mina SSHD still relied on no > leading zero bytes being present. Hence assumptions mismatched and > caused the issue. > > The Java we use at gerrit.wikimedia.org is recent enough to no longer > strip leading zero bytes. So when connecting to our gerrit through > ssh, either > > * the negotiated session key starts with a non-zero byte, and > everything works nicely. This case happens most of the time. > > * the negotiated session key starts with a zero byte. Then gerrit's > built-in Apache Mina SSHD falsely treats the key as if there were no > leading zero bytes and the connection setup with the client fails. > > SSHD-330 adds stripping of leading zero bytes from the session key to > Apache Mina SSHD and thereby fixes the issue we are seeing. > > ------ > > There was recently some FUD around OpenSSL generated keys not being > affected. That did not work for me, and I do not see in code how this > would make a difference. > > Also, there was some recent discussion around extracting the keys from > the keystore to proper files. I did not get a chance to try that, but > that could do the trick too ... indirectly. > Because in order to get gerrit to use keys from separate files, one > needs to install BouncyCastle libraries to gerrit. BouncyCastle will > act as provider for the needed security/crypto functionality and > get used instead of Java's default providers. As the BouncyCastle > providers (for now) also strip the leading zero bytes, that could > work out. > > Regardless, having Apache Mina SSHD to strip leading zero bytes seems > most reliable, so we backported the Apache Mina SSHD's upstream fix to > the version used in our gerrit, and rebuilt gerrit using that custom > built Apache Mina SSHD. > > [1] https://mina.apache.org/sshd-project/ > [2] I know that OpenJDK versions up to > OpenJDK Runtime Environment (IcedTea7 2.2.1) (Gentoo build 1.7.0_05-b21) > work and the default providers strip the leading zeros, while the ones from > OpenJDK Runtime Environment (IcedTea 2.4.7) (7u55-2.4.7-1ubuntu1~0.12.04.2) > do not strip them. > > > Thanks Krinkle for the pointer to SSHD-330! And thank you for the analysis and the informative summary -- well done!