Last modified: 2014-09-15 22:28:43 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T72103, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 70103 - Security test load caused search and page loads extremely slow on beta cluster
Security test load caused search and page loads extremely slow on beta cluster
Status: RESOLVED FIXED
Product: Wikimedia Labs
Classification: Unclassified
deployment-prep (beta) (Other open bugs)
unspecified
All All
: Highest normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-08-27 20:07 UTC by Rummana Yasmeen
Modified: 2014-09-15 22:28 UTC (History)
20 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Rummana Yasmeen 2014-08-27 20:07:29 UTC
I am observing degrade in the performance of search in Betalabs which includes searching for matching link target names,images, category names, templates etc.
In some cases , it is taking even 7-8 secs to bring up the matched results.
Comment 1 Nik Everett 2014-08-27 21:31:05 UTC
I see it taking a long time too.  The load on the search servers is quite low.  Beta's ganglia seems to be down so I can't see what it is historically.  Is this a new thing?  I always remember beta being slow slow slow.
Comment 2 Greg Grossmeier 2014-08-27 22:09:18 UTC
Not this slow.
Comment 3 Greg Grossmeier 2014-08-27 22:30:00 UTC
18:29 <     bd808> For some currently unknown reason varnish is not caching anything
Comment 4 Bryan Davis 2014-08-27 22:57:29 UTC
That may have been my user-agent doing something strange.

Antoine also found that we were undergoing a high rate vulnerability scan from a volunteer.
Comment 5 Antoine "hashar" Musso (WMF) 2014-08-27 23:06:27 UTC
We have some security audit being run on the beta cluster. Unfortunately the script is not throttled and cause a fair amount of queries on the backend server.

We apparently only had one (hhvm) application server until today which does not help either.

I have blacklisted the volunteer IP on the beta cluster varnish caches using:

  ip route add blackhole x.y.z.a/32

Actual IP can be found by using 'route -n'.


To remove the blacklist one can:


 ip route add blackhole "THE IP ADDRESS/32"
Comment 6 Greg Grossmeier 2014-08-27 23:51:10 UTC
gjg@deployment-bastion:/data/project/logs$ grep -c REDACTED xff.log 
4960
gjg@deployment-bastion:/data/project/logs/archive$ zgrep -c REDACTED xff.log-20140*
...bunch of 0s...
xff.log-20140816.gz:0
xff.log-20140817.gz:0
xff.log-20140818.gz:0
xff.log-20140819.gz:0
xff.log-20140820:0
xff.log-20140821:2034
xff.log-20140824:199048
xff.log-20140827:184299

And total lines:
   9130 xff.log-20140820
   20285 xff.log-20140821
  208037 xff.log-20140824
  197124 xff.log-20140827

Basically, 99% of the traffic to the Beta Cluster was from this tool.

This is the root cause of the slowness. The awesome volunteer will throttle his bot for us.
Comment 7 Antoine "hashar" Musso (WMF) 2014-08-27 23:53:12 UTC
The udp2log-mw service on deployment-bastion.eqiad.wmflabs logs the average number of packets it receives per second over 5 minutes.

The file is /var/log/udp2log/udp2log.log

It shows up we went from 0.0xx k/s  to 2.500 k/s which indicates a huge amount of requests being done.
Comment 8 Greg Grossmeier 2014-08-27 23:56:07 UTC
Closing this.

Thanks Rummana for the heads up and for all who helped debug this multilayered issue.
Comment 9 Rummana Yasmeen 2014-09-15 20:01:24 UTC
This is again happening.
Comment 10 Rummana Yasmeen 2014-09-15 21:24:56 UTC
Its slow while searching for pages, link target names, media files etc but page loading is fine at my end.
Comment 11 Greg Grossmeier 2014-09-15 22:24:23 UTC
I just sat next to Rummana to see the symptoms. There a bit sporatic but noticable.

I'll open a new bug and consider this one closed and just for the security testing that was going on.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links