Last modified: 2014-09-15 22:28:43 UTC
I am observing degrade in the performance of search in Betalabs which includes searching for matching link target names,images, category names, templates etc. In some cases , it is taking even 7-8 secs to bring up the matched results.
I see it taking a long time too. The load on the search servers is quite low. Beta's ganglia seems to be down so I can't see what it is historically. Is this a new thing? I always remember beta being slow slow slow.
Not this slow.
18:29 < bd808> For some currently unknown reason varnish is not caching anything
That may have been my user-agent doing something strange. Antoine also found that we were undergoing a high rate vulnerability scan from a volunteer.
We have some security audit being run on the beta cluster. Unfortunately the script is not throttled and cause a fair amount of queries on the backend server. We apparently only had one (hhvm) application server until today which does not help either. I have blacklisted the volunteer IP on the beta cluster varnish caches using: ip route add blackhole x.y.z.a/32 Actual IP can be found by using 'route -n'. To remove the blacklist one can: ip route add blackhole "THE IP ADDRESS/32"
gjg@deployment-bastion:/data/project/logs$ grep -c REDACTED xff.log 4960 gjg@deployment-bastion:/data/project/logs/archive$ zgrep -c REDACTED xff.log-20140* ...bunch of 0s... xff.log-20140816.gz:0 xff.log-20140817.gz:0 xff.log-20140818.gz:0 xff.log-20140819.gz:0 xff.log-20140820:0 xff.log-20140821:2034 xff.log-20140824:199048 xff.log-20140827:184299 And total lines: 9130 xff.log-20140820 20285 xff.log-20140821 208037 xff.log-20140824 197124 xff.log-20140827 Basically, 99% of the traffic to the Beta Cluster was from this tool. This is the root cause of the slowness. The awesome volunteer will throttle his bot for us.
The udp2log-mw service on deployment-bastion.eqiad.wmflabs logs the average number of packets it receives per second over 5 minutes. The file is /var/log/udp2log/udp2log.log It shows up we went from 0.0xx k/s to 2.500 k/s which indicates a huge amount of requests being done.
Closing this. Thanks Rummana for the heads up and for all who helped debug this multilayered issue.
This is again happening.
Its slow while searching for pages, link target names, media files etc but page loading is fine at my end.
I just sat next to Rummana to see the symptoms. There a bit sporatic but noticable. I'll open a new bug and consider this one closed and just for the security testing that was going on.