Last modified: 2013-10-25 12:51:57 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T53310, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 51310 - create dedicated instance for exturl checking
create dedicated instance for exturl checking
Status: RESOLVED FIXED
Product: Wikimedia Labs
Classification: Unclassified
tools (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Marc A. Pelletier
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-14 03:45 UTC by Giftpflanze
Modified: 2013-10-25 12:51 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Giftpflanze 2013-07-14 03:45:13 UTC
I have a tool on the toolserver that checks all weblinks in dewiki for availability and reports broken links on talk pages. To check the links in parallel I would need a separate instance within the tools project to not to impact the others. Maybe a dedicated grid node would also be OK. There would also be the possibility to have a separate project, but I don't know if that makes sense. Is it possible to have root on my instance? If not, I would need Tcl 8.6 (tip-386-impl branch), tcllib, TclCurl, mysqltcl (or better tdbc and tdbc::mysql), access to the production replicas, local (mysql) database and access to the file storage (/data/project, maybe also dumps). My username is gifti, service-group is giftbot.
Comment 1 Yuvi Panda 2013-07-18 10:07:30 UTC
Having this run as multiple continuous tasks should be good enough, methinks. Perhaps use a redis queue to co-ordinate between multiple instances? :)
Comment 2 Tim Landscheidt 2013-07-22 17:25:20 UTC
I don't quite understand why a separate instance is needed.  Why doesn't work this on Tools now?  You shouldn't need to worry about impacting others, that's what the grid is for.

(In reply to comment #1)
> Having this run as multiple continuous tasks should be good enough, methinks.
> Perhaps use a redis queue to co-ordinate between multiple instances? :)

The nice thing about MediaWiki is that you have page IDs.  So you don't have to write complex job schedulers, but for example can have 10 jobs that each check the external links of pages where (i - 1) * 1000000 <= el_from < i * 100000 (for example, depending on the maximum page_id of course).
Comment 3 Tim Landscheidt 2013-07-22 17:26:19 UTC
(In reply to comment #2)
> [...]
> The nice thing about MediaWiki is that you have page IDs.  So you don't have
> to
> write complex job schedulers, but for example can have 10 jobs that each
> check
> the external links of pages where (i - 1) * 1000000 <= el_from < i * 100000
> (for example, depending on the maximum page_id of course).

... provided, that the number of zeroes are equal :-).  Well, you get the gist.
Comment 4 Marc A. Pelletier 2013-08-29 20:11:36 UTC
It's also not clear to me why a dedicated instance is required for this.  Why does the grid scheduling not suffice?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links