Last modified: 2013-05-05 20:18:11 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T32716, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 30716 - Run our own Tor client for Tor block
Run our own Tor client for Tor block
Status: REOPENED
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal enhancement with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
: ops, platformeng
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-09-03 00:20 UTC by RonaldB
Modified: 2013-05-05 20:18 UTC (History)
14 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description RonaldB 2011-09-03 00:20:47 UTC
It seems that the TOR block extension is not working anymore, probably caused because https://check.torproject.org/cgi-bin/TorBulkExitList.py?ip=208.80.152.2 gives a HTTP 403 (Forbidden).

Noticed, because my open proxy monitoring system started to report edits via (long existing) TOR exit nodes. Now started to maintain again the TOR exit nodes in my database of open proxies.

Recent test: http://nl.wikipedia.org/w/index.php?title=Overleg_Wikipedia:Zandbak&diff=27083157&oldid=26766446, reported here: http://nl.wikipedia.org/w/index.php?title=Wikipedia:Open_proxy_detectie&diff=27083159&oldid=27082535

Usable input for TOR exit nodes is now: http://torstatus.blutmagie.de/ (link on new main page of torproject.org), to be filtered for exit nodes. Is less precise, because of http://meta.wikimedia.org/wiki/Tor_Exit_Node_Configuration configs can't be noticed. Alternative is via https://www.torproject.org/projects/tordnsel.html.en, but generates a lot of traffic. Combination of both may be considered if original url remains not accessible.
Comment 1 Tim Starling 2011-11-22 10:32:15 UTC
Removing "easy" tag, the extension needs to be mostly rewritten. 

Ideally we would get the data from Tor's directory system rather than relying on a script running on the personal webspace of some random software developer.
Comment 2 Platonides 2011-11-22 21:03:31 UTC
The list at http://meta.wikimedia.org/wiki/Tor_Exit_Node_Configuration is outdated. We are using a ip per project now.
Comment 3 Platonides 2011-11-22 22:05:03 UTC
The python script used is not hard. We could almost run it locally or reimplement it in php.
Its algorithm is:
* Fetch a raw list of all exitAddresses by grepping ExitAddress from RawExitList. Store it as a parsed-exit-list (a list of all potential exit ips)
* When asked about an ip + port, perform a query for each ip on the parsed list to the trordnsel service [1] on <clientIP>.<port>.<target>.ip-port.exitlist.torproject.org. (NXDomain not accessible, 127.0.0.2 accessible) and cache it.

The problem is getting the exitAddresses list. TorBulkExitList.py reads it from a local file at /srv/check.torproject.org/tordnsel/state/exit-addresses with a comment pointing to download it from http://exitlist.torproject.org/exitAddresses (which doesn't load)

tordnsel mention that "it establishes a persistent controller connection to Tor, receiving updated nodes and exit policies as Tor fetches them from directories." which is probably the best to . Another way may be to "parsing the cached-routers file,". Both alternatives seem to require running tor.

1- https://svn.torproject.org/svn/check/trunk/cgi-bin/TorBulkExitList.py
https://www.torproject.org/projects/tordnsel.html.en
Comment 4 Platonides 2011-11-22 22:34:40 UTC
If we know of a tor server providing directory information (eg. 10.10.10.10:9030), I think we could get the list of servers by doing

wget -O - http://10.10.10.10:9030/tor/status-vote/current/consensus.z | grep ^r\  /tmp/consensus.z  | cut -d ' ' -f 7

Of course, if we have the consensus (or a cached copy by a client), we also have the list of accepted/rejected ports for each server. But it seems we would also need a copy of the descriptors for matching the ips.

For that we would fetch instead http://10.10.10.10:9030/tor/server/all and use a different parsing.
Comment 5 Tim Starling 2011-11-23 06:54:23 UTC
I think it's best to run our own Tor client rather than rely on someone else's. There's no Tor single server with a sufficient commitment to uptime, including *.torproject.org as this bug demonstrates. 

Extracting information from a normal Tor client gives us a better chance of being able to build a valid exit list if the relationship between the Tor community and projects like TorBlock becomes adversarial.

There's no file called cached-routers in my version of Tor, but there is a /var/lib/tor/cached-consensus which seems to have the required information.
Comment 6 Platonides 2011-12-02 21:23:43 UTC
/var/lib/tor/cached-consensus is the consensus
The other data could be available at /var/lib/tor/cached-descriptors but that seems to include more things, like public keys of hidden services.

If we run a tor client, it seems preferable to investigate the protocol to get a live feed from our client. The other method may be kept for users which won't be running a daemon.
Comment 7 Tim Starling 2011-12-03 12:12:54 UTC
(In reply to comment #7)
> If we run a tor client, it seems preferable to investigate the protocol to get
> a live feed from our client. The other method may be kept for users which won't
> be running a daemon.

Can we export the exit list from TorBlock itself and publish that data on WMF servers for the benefit of TorBlock users without a Tor client? Say with an API module?
Comment 8 billinghurst 2012-03-20 23:59:48 UTC
Is there any progress on this matter? With the current batch of spam bots, it would be good to be able to rule this out as one of the holes in the defence.

We are seeing such a persistent level of spam bot attacks that seem to be concentrated from within certain IP networks (within a number of /16 with some repeat IP, others single use IP) and one wonders whether this matter is part of the issue.

From an IP address we get a blurt of account creations, some the same account multiple sites, and/or multiple accounts. Generally focuses around the same set of wikis, though it has been noticed to be spreading to more wikis, no obvious pattern.

Nothing specifically is showing for XFF.

Thanks.
Comment 9 Chris Steipp 2012-05-21 23:48:47 UTC
It looks like both:

https://check.torproject.org/cgi-bin/TorBulkExitList.py?ip=208.80.152.2
and
http://exitlist.torproject.org/exit-addresses

are working again. Running our own node seems like it would get us the list of all exit nodes (same as http://exitlist.torproject.org/exit-addresses).

The preferred way to check an ip seems to be the dns exit list (tordnsel) since that would test if a particular ip address is a hidden node. But that seems like a lot of traffic for us to do a dns lookup for every incoming ip.

Tim, are you still thinking the whole extension needs a rewrite still?
Comment 10 Platonides 2012-05-22 14:40:41 UTC
Then blocking will have fixed itself.

Note we don't want to block all exit nodes, only those from which we can be reached.

What do you mean by hidden node? A hidden service?
Comment 11 Chris Steipp 2012-05-22 16:53:19 UTC
Oh, I was referring to this: "Previous DNSELs scraped Tor's network directory for exit node IP addresses, but this method fails to list nodes that don't advertise their exit address in the directory. TorDNSEL actively tests through these nodes to provide a more accurate list." [1]

Since users have know about servers they route through, any unlisted node would have a somewhat limited userbase, so I'm not sure if we need to be as concerned about them for spam. But if we really want to catch them all, we would need to do a dns lookup for each connecting ip.

Is it possible to confirm that the list is being updated for enwiki, and billinghurst's problems are not related/current? If that's the case, I think this ticket can either be closed, or turned into a feature request for making the extension use the dns method.

[1] - https://www.torproject.org/projects/tordnsel.html.en
Comment 12 Platonides 2012-05-23 20:18:32 UTC
Yes, it is updated. I just asked and the list currently has 642 ips.
The problems reported on March should be gone now.
Comment 13 Nemo 2012-05-23 21:00:26 UTC
So this:

(In reply to comment #5)
> I think it's best to run our own Tor client rather than rely on someone else's.
> There's no Tor single server with a sufficient commitment to uptime, including
> *.torproject.org as this bug demonstrates. 

should be split to another bug?
Comment 14 Platonides 2012-05-23 21:32:27 UTC
I'd just repurpose this bug, then.
Comment 15 Nemo 2012-05-24 05:53:36 UTC
(In reply to comment #14)
> I'd just repurpose this bug, then.

Ok.
Comment 16 Tyler Romeo 2013-05-04 21:53:29 UTC
FWIW, Extension:TorBlock was recently rewritten (and as of now actually works) using the Onionoo protocol. If WMF wants, it can set up its own Onionoo server and then just point the $wgOnionooServer variable to the local server.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links